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1.  Final  Report  for  Years  1995-1998 

The  purpose  of  this  project  has  been  to  develop  Laplace  and  Low-Frequency  Helmholtz  Fast 
Multipole  Algorithms.  These  are  fundamental  building  blocks  (and  computational  logjams)  for 
circuit  simulation.  At  the  inception  of  the  project,  we  felt  that  the  following  steps  would  have 
to  be  undertaken  to  assure  its  eventual  success. 

1.  Design  the  improved  diagonal  forms  for  the  translation  operators  for  the  Laplace  and  low- 
frequency  Helmholtz  equations. 

2.  Design  improved  algorithms  for  the  construction  of  Generalized  Gaussian  Quadratures 
required  by  the  diagonal  forms  of  the  preceding  paragraph. 

3.  Design  and  implement  the  improved  FMM  schemes  for  the  Laplace  and  low-frequency 
Helmholtz  equation,  first  in  two  and  later  in  three  dimensions. 

4.  Design  the  numerical  machinery  for  the  interpolation  and  filtering  of  band-limited  functions 
on  the  sphere,  required  by  advanced  versions  of  the  Helmholtz  FMM. 

5.  Implement  the  final  versions  of  the  FMM  for  the  low-frequency  Helmholtz  equation,  both 
as  a  stand-alone  device  in  circuit  modeling  and  related  areas,  and  as  the  sub-wavelength  part 
of  a  high-frequency  modeling  code. 

The  project  has  been  in  existence  for  3  years  (as  per  our  proposal),  and  the  first  four  of 
the  above  five  steps  have  been  entirely  completed;  results  of  this  work  are  reported  in  the 
publications  [4],  [1],  [3],  [2],  [9],  [11],  [10],  [12].  Of  these,  [4],  [3],  [9],  [11],  [10],  [12]  have 
been  either  published  or  accepted  for  publication;  since  [1],  [2]  have  not  been  submitted  for 
publication  at  this  time,  their  copies  are  attached. 

The  fifth  step  has  been  in  pent  completed  at  FMAH,  in  part  by  our  collaborators  at  Boeing, 
and  in  part  it  is  being  completed  at  the  present.  Three  publications  are  in  preparation  reporting 
this  step.  The  following  personnel  have  been  involved  in  this  project:  R.  Coiftnan,  M.  Goldberg, 
L.  Greengaxd,  T.  Hagstrom,  M.  Israeli,  V.  Rokhlin,  J.  Stromberg. 

Below  is  a  description  of  our  effort  year-by-year. 
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Year  1 


1.  We  started  our  work  with  the  design  of  an  improved  version  of  the  Fast  Multipole  Method 
for  the  Laplace  equation  in  two  dimensions.  In  addition  to  having  its  own  applications,  this 
scheme  was  viewed  as  a  stepping  stone  to  three-dimensional  problems,  first  for  the  Laplace  and 
later  for  the  low-frequency  Helmholtz  equation.  This  work  was  completed  during  the  first  year 
of  the  project,  and  the  paper  reporting  it  recently  appeared  in  the  SISC  (see  [4]). 

2.  A  preliminary  (non-adaptive)  version  of  the  improved  FMM  for  the  Laplace  equation  in  three 
dimensions  was  designed  and  implemented.  The  algorithm  uses  diagonal  forms  of  translation 
operators  based  on  the  Generalized  Gaussian  Quadratures,  designed  by  us  specifically  for  this 
purpose  (and  to  be  used  in  similar  situations  that  might  arise  in  the  future).  At  this  stage, 
the  FMM  became  an  effective  tool  for  high-precision  potential  calculations  in  three  dimensions. 
On  the  other  hand,  the  scheme  implemented  at  this  time  was  not  adaptive,  and  its  memory 
requirements  were  deemed  excessive.  The  algorithm  is  described  in  [3].  As  often  happens,  by 
the  time  we  finished  [3],  we  had  constructed  several  additional  improvements  to  the  scheme.  It 
was  decided  to  incorporate  these  improvements  in  the  adaptive  version  of  the  algorithm;  the 
paper  [3]  does  not  contain  them. 

3.  The  improvements  in  the  design  of  Fast  Multipole  algorithms  described  in  the  preceding 
two  paragraphs  were  made  possible  by  the  existence  of  a  class  of  diagonal  representations  of 
translation  operators  (for  Helmholtz,  Laplace,  and  Yukawa  equations),  ultimately  based  on 
the  Generalized  Gaussian  Quadratures.  The  latter  have  been  known  for  a  very  long  time  (the 
oldest  references  we  are  aware  of  are  Markov’s  papers  at  the  end  of  the  last  century).  On  the 
other  hand,  the  classical  proofs  do  not  provide  any  mechanism  for  the  numerical  construction 
of  such  schemes.  Thus,  we  launched  an  effort  to  construct  an  algorithm  for  the  design  of 
such  quadratures.  The  effort  was  successful,  and  is  reported  in  [10];  at  that  time,  the  obtained 
quadratures  were  virtually  optimal  for  the  Laplace  equation,  in  both  two  and  three  dimensions. 
The  Helmholtz  and  Yukawa  equations  required  additional  quadrature  work. 

4.  “Fast”  schemes  for  the  application  of  discretized  integral  operators  are  a  critical  element  of 
an  effective  circuit  simulation  environment.  Another  critical  element  of  such  an  environment 
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is  our  ability  to  discretize  the  integral  equations  effectively.  Previously,  we  had  neglected  this 
aspect  of  the  problem;  during  the  first  year  of  the  project,  we  started  a  systematic  investigation 
of  relevant  discretization  schemes  and  associated  quadrature  formulae. 

Year  2 

1.  During  the  first  year  of  the  project,  we  had  constructed  a  preliminary  (non-adaptive) 
version  of  the  FMM  for  the  Laplace  equation  in  three  dimensions.  While  the  scheme  had  satis¬ 
factory  CPU  timings  for  all  accuracies  of  practical  importance,  its  memory  requirements  were 
deemed  excessive,  and  we  felt  that  the  design  of  translation  operators  we  used  could  be  im¬ 
proved.  Thus,  a  significant  amount  of  effort  was  contributed  to  the  design  of  an  improved  FMM 
for  the  Laplace  equation  (principally,  in  three  dimensions)  based  on  improved  representations 
of  the  potential;  we  have  also  designed  a  reformulation  of  the  algorithm  with  dramatically  re¬ 
duced  memory  requirements.  The  resulting  scheme  is  about  twice  faster  than  the  one  produced 
during  the  first  year  of  the  project,  and  its  memory  requirements  were  reduced  by  better  than  a 
factor  of  5.  The  report  describing  the  scheme  was  delayed,  and  appeared  only  in  1998  (see  [1]). 

2.  An  outstanding  problem  in  the  design  of  Fast  Multipole  Methods  for  the  Helmholtz  equation 
has  been  the  expense  associated  with  filtering  and  interpolation  of  functions  on  the  sphere.  For 
the  FMM  in  two  dimensions,  the  relevant  procedure  is  easily  performed  on  the  circle  via  the 
FFT  at  a  nearly  optimal  cost;  the  same  operation  on  the  sphere  can  not  be  implemented  via 
standard  algebraic  procedures.  During  the  second  year  of  the  project,  we  observed  that  there 
exists  a  version  of  the  FMM  in  one  dimension  that  performs  the  interpolation  and  filtering  on 
the  sphere  in  (asymptotically)  optimal  time;  it  should  be  mentioned  that  it  is  a  modification  of 
an  earlier  scheme  by  B.  Alpert  and  R.  Jakob-Chien  (see  [5]).  The  algorithm  was  implemented 
and  made  available  to  the  Boeing  and  Hughes  groups.  Reports  describing  our  new  design  of  the 
FMM  in  one  dimension  and  on  the  application  of  such  schemes  to  filtering  and  interpolation 
on  the  sphere  are  attached  (see  [11,  12]). 

3.  One  of  principal  purposes  of  this  project  has  been  the  design  of  a  version  of  the  FMM  for  the 
Helmholtz  equation  at  low  frequencies,  and  of  a  scheme  combining  such  a  version  with  the  high- 
frequency  FMM;  in  that  sense,  an  efficient  scheme  for  the  Laplace  equation  is  a  preliminary 
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step.  We  derived  the  integral  representations  analogous  to  those  used  in  the  recent  FMM 
scheme  for  the  Laplace  equation,  and  tested  it  with  a  preliminary  version  of  the  quadratures. 
A  report  [9]  describing  these  developments  is  attached. 

Year  3 

1.  We  have  concluded  the  design  of  special-purpose  quadratures  to  be  used  for  the  diagonal 
representation  of  potentials  for  the  low-frequency  FMM  for  the  Helmholtz  equation,  and  for  the 
Yukawa  equations  (both  in  two  and  three  dimensions).  The  scheme  we  have  constructed  are 
an  extension  of  the  techniques  we  constructed  in  [10]  and  used  in  [4],  [1]  to  design  an  FMM  for 
the  Laplace  equation  in  two  and  three  dimensions,  respectively;  we  believe  that  the  resulting 
implementations  are  more  or  less  optimal.  A  paper  reporting  this  work  is  in  preparation,  and 
we  are  we  are  enclosing  a  draft  [2]. 

2.  In  accordance  with  our  usual  practice,  we  have  implemented  preliminary  (non-adaptive) 
versions  of  the  Yukawa  and  low-frequency  Helmholtz  equations.  The  resulting  algorithms  are 
similar  (in  terms  of  speed,  accuracy,  and  complexity)  to  the  algorithm  for  the  Laplace  equation 
reported  in  [3].  The  report  describing  this  work  is  in  preparation. 

3.  To  some  extent,  during  the  final  stages  of  this  project,  we  reduced  our  concentration  on 
the  mechanics  of  the  FMM  itself,  and  intensified  our  investigation  of  collateral  issues,  such 
as  discretization  of  surfaces,  connection  with  the  recently  developed  time-domain  solvers,  the 
possibility  of  direct  solvers  in  the  frequency  domain,  issues  arising  in  the  application  of  our 
techniques  in  the  high-frequency  regime  (the  latter  being,  arguably,  outside  the  scope  of  this 
project).  The  principal  reason  for  this  redeployment  is  the  fact  that  the  FMM  in  its  original 
form  is  rapidly  becoming  an  accepted  tool  in  several  areas,  and  it  has  not  been  the  purpose  of 
this  project  to  design  algorithms  for  specific  applications. 
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We  present  a  non-linear  optimization  procedure  for  the  design  of  Generalized  Gaussian 
Quadratures  for  a  fairly  broad  class  of  functions.  While  some  of  the  components  of  the 
algorithm  have  been  published  previously,  we  introduce  an  improved  procedure  for  the 
determination  of  am  acceptable  initial  point  for  the  continuation  scheme  that  stabilizes  the 
Newton-type  process  used  to  find  the  quadratures.  The  resulting  procedure  never  failed 
when  applied  to  Chebyshev  systems  (for  which  the  existence  and  uniqueness  of  Generalized 
Gaussian  Quadratures  are  well-known);  it  also  worked  for  many  non-Chebyshev  systems, 
for  which  the  Generalized  Gaussian  Quadratures  (generally  speaking)  do  not  exist.  The 
performance  of  the  algorithm  is  illustrated  with  several  numerical  examples;  some  of  the 
presented  quadratures  integrate  efficiently  large  classes  of  singular  functions. 
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1  Introduction 


Quadrature  formulae  are  one  of  the  most  developed  areas  of  computational  mathematics.  They 
are  used  both  as  a  stand-alone  numerical  tool  for  the  evaluation  of  integrals,  and  as  an  analytical 
apparatus  for  the  design  of  interpolation  schemes,  finite  element  schemes,  etc.  Most  of  the 
quadrature  formulae  (at  least  for  functions  on  R1)  currently  in  use  can  be  separated  into  three 
groups: 

1.  Gaussian  quadratures  are  the  optimal  tool  for  the  evaluation  of  integrals  of  the  form 

[  u{t)  ■  P{t)dt,  (1) 

Ja 

where  P  is  a  polynomial  of  t  (or  a  function  well-approximated  by  a  polynomial),  and  w  is  a 
(more  or  less)  arbitrary  non-negative  function  [a,  6]  — »  R.  Gaussian  quadratures  are  extremely 
efficient,  mathematically  elegant,  and  easy  to  obtain  (see,  for  example  [3]);  whenever  applicable, 
they  tend  to  be  the  numerical  tool  of  choice. 

2.  Interpolator  quadrature  formulae  (Newton-Cotes,  etc.)  are  based  on  approximating  the 
integrand  by  some  standard  function  (usually,  a  polynomial),  and  integrating  the  latter.  These 
schemes  have  the  advantage  that  they  (usually)  do  not  prescribe  the  locations  of  the  nodes; 
they  tend  to  become  numerically  unstable  for  high  orders. 

3.  Miscellaneous  special-purpose  quadratures  (“product  integration  rules”,  non-standard  Richard¬ 
son  extrapolation,  etc.)  Eire  normally  used  when  the  situation  precludes  the  use  of  more  straight¬ 
forward  techniques. 

There  appears  to  exist  a  class  of  situations  where  classical  approaches  fail  to  produce  rapidly 
convergent  schemes.  Specifically,  suppose  that  we  wish  to  integrate  functions  of  the  form 

•  sk(x),  (2) 

Jfc=0 

where  <fik  are  smooth  functions  (or  polynomials)  mapping  [0, 1]  -+  R,  and  the  functions  sk  : 
[0, 1]  ->  R  are  known  apriori,  and  have  singularities  at  x  =  0.  In  many  situations  of  interest, 
the  functions  sk  have  different  singularities  at  x  =  0,  and  the  functions  <f>k  are  not  known  at  all; 
it  is  only  known  that  the  integrand  has  the  form  (2),  and  its  values  at  points  on  the  interval 
[0, 1]  can  be  evaluated.  While  efficient  quadratures  for  functions  of  the  form  (2)  would  have 
obvious  applications  in  the  solution  of  integral  equations,  in  numerical  complex  analysis,  and 
in  several  other  areas,  the  authors  have  failed  to  find  such  an  apparatus  in  the  literature. 

It  has  been  known  for  about  100  years  that  Gaussian  quadratures  admit  a  drastic  gener¬ 
alization,  replacing  polynomials  with  fairly  general  systems  of  functions  (see  [11,  12],  [2,  8], 
[6,  7]).  The  constructions  found  in  (see  [11,  12],  [2,  8],  [6,  7])  do  not  easily  yield  numerical 
algorithms  for  the  design  of  such  quadrature  formulae;  algorithms  of  this  type  were  designed 
(in  some  cases)  in  [10,  15],  where  the  resulting  quadrature  rules  are  referred  to  as  General¬ 
ized  Gaussian  Quadratures.  The  approach  is  based  on  the  observation  that  the  nodes  and 
weights  of  Gaussian  quadratures  satisfy  systems  of  non-linear  equations,  that  these  equations 
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have  unique  solutions,  and  that  when  polynomials  are  replaced  with  other  systems  of  functions, 
similar  systems  of  equations  are  easily  constructed.  While  for  functions  of  the  form  (2)  the 
resulting  equations  are  non-linear,  overdetermined,  and  non-unique,  in  the  least  squares  sense 
they  have  unique  solutions  under  surprisingly  general  conditions  (see  [10,  15]);  Newton-type 
methods  converge  in  this  environment,  provided  a  good  initial  approximation  can  be  found. 

As  often  happens,  in  the  absence  of  a  good  initial  approximation,  the  Newton  process  fails  to 
converge.  To  some  extent,  this  problem  is  remedied  by  the  use  of  continuation  techniques,  which 
turn  out  to  be  almost  always  available  when  designing  quadratures  for  integrands  (2).  However, 
yet  another  problem  is  frequently  encountered:  even  though  mathematically  the  solution  of  the 
non-linear  problem  is  unique  for  all  values  of  the  continuation  parameter,  numerically  it  is 
not  unique  at  all.  Once  the  (numerical)  rank  of  the  Jacobian  of  an  intermediate  problem 
is  sufficiently  low,  the  continuation  process  breaks  down;  attempts  to  use  globalized  search 
techniques  have  not  been  successful. 

The  final  step  in  the  design  of  a  robust  scheme  for  the  construction  of  Generalized  Gaussian 
Quadratures  is  described  in  Section  3.3.  It  finds  an  initial  approximation  for  which  the  Jacobian 
of  the  system  being  solved  has  am  acceptably  low  condition  number.  While  the  reasoning 
behind  this  step  is  partly  Heuristic,  in  our  experience  it  works  remarkably  well.  It  never  failed 
for  a  Chebyshev  system  (see  Section  2.1  below);  furthermore,  it  worked  for  most  of  the  non- 
Chebyshev  systems  we  tried  it  on.  For  a  more  detailed  discussion  of  our  numerical  experience, 
see  Section  5  below,  where  we  also  present  quadratures  for  functions  with  almost  general  power 
singularities  at  one  end  (or  both  ends)  of  the  interval  of  integration,  and  with  several  other 
types  of  singularities. 

The  paper  is  structured  as  follows.  Section  2  contains  mathematical  and  numerical  prelimi¬ 
naries.  In  Section  3,  we  build  the  numerical  apparatus  to  be  used  in  Section  4  to  construct  the 
procedure  for  the  determination  of  nodes  and  weights  of  Generalized  Gaussian  Quadratures. 
Section  5  contains  several  examples  of  quadratures  we  have  obtained.  Finally,  in  Section  6  we 
outline  several  possible  extensions  of  this  work. 


2  Mathematical  and  Numerical  Preliminaries 


2.1  Chebyshev  systems 

Definition  2.1  A  sequence  of  functions  will  be  referred  to  as  a  Chebyshev  system 

on  the  interval  [o,  6]  if  each  of  them  is  continuous  and  the  determinant 


<t>  lfai)  <t>i(xn) 


(3) 


I  ^n(^n)  I 

is  nonzero  for  any  sequence  of  points  xi,...,xn  such  that  a  <  x\  <  •  •  •  <  xn  ^ 


An  alternate  definition  of  a  Chebyshev  system  is  that  any  linear  combination  of  the  functions 
with  nonzero  coefficients  must  have  no  more  than  n  zeros. 

A  related  definition  is  that  of  an  extended  Chebyshev  system. 
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Definition  2.2  Given  a  set  of  functions  <j>x, . . . ,  <t>n  which  are  continuously  differentiable  on  an 
interval  [a,  b],  and  given  a  sequence  of  points  x\,...,xn  such  that  a  <  x\  <  X2  <  •  •  •  <  xn  <  b, 
let  the  sequence  m\, . . .  ,mn  be  defined  by  the  formulae 


mi  =  0, 

1 

II 

O 

ifj>l  and  Xj  ±  Xj-i, 

m,j  —  j  -  1 

if  j  >  1  and  Xj  =  Xj-\  =  ...  =  xx, 

mj  =  k 

if  j  >  k  +  1  and  Xj  =  Xj-i  —  ...  —  Xj 

matrix  C(x  1, . . . 

,xn)  =  [cy]  be  defined  by  the  formula 

(4) 


Cij  — 


<r*4n 


dxmj 


(5) 


in  which  (xj)  is  taken  to  be  the  function  value  <pi(xj).  Then  <fi\ , .  •  • ,  <fin  will  be  referred  to 
as  an  extended  Chebyshev  system  on  [a,  6]  if  the  determinant  |  , . . . ,  xn)|  is  nonzero  for  all 

such  sequences  x,. 

Remark  2.1  It  is  obvious  from  Definition  2.2  that  an  extended  Chebyshev  system  is  a  special 
case  of  the  Chebyshev  system.  The  additional  constraint  is  that  the  successive  points  xx  at  which 
the  function  is  sampled  to  form  the  matrix  may  be  identical;  in  that  case,  for  each  duplicated 
point,  the  first  corresponding  column  contains  the  function  values,  the  second  column  contains 
the  first  derivatives  of  the  functions,  the  third  column  contains  the  second  derivatives  of  the 
functions,  and  so  forth;  this  matrix  must  also  be  nonsingular. 

Examples  of  Chebyshev  and  extended  Chebyshev  systems  include  the  following  (additional 
examples  can  be  found  in  [7]). 

Example  2.1  The  powers  l,x,x2,...,xn  form  an  extended  Chebyshev  system  on  the  interval 
(-00,00). 

Example  2.2  The  exponentials  e~XlX ,  e~XiX, ...,  e~XnX  form  an  extended  Chebyshev  system  for 
any  Ai, . . . ,  An  >  0  on  the  interval  [0, 00). 

Example  2.3  The  functions  1, cos x, sin x, cos 2x, sin 2x,... , cos nx, sin nx  form  a  Chebyshev 
system  on  the  interval  [0, 27r). 


2.2  Generalized  Gaussian  quadratures 


The  quadrature  rules  considered  in  this  paper  are  expressions  of  the  form 


n 


Y,Wj-4>(xj) 

3=1 


(6) 


where  the  points  Xj  6  R.  and  coefficients  Wj  €  R.  are  referred  to  as  the  nodes  and  weights  of  the 
quadrature,  respectively.  They  serve  as  approximations  to  integrals  of  the  form 


4>(x)  cdot<jj{x)dx 


(7) 
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where  to  has  the  form 


(8) 


m 

to{x)  =  u>(x)  +  S(x  -  Xj), 

j= 1 

with  77i  a  non- negative  integer,  Co  :  [a,  6]  — >■  R.  an  integrable  non- negative  function,  Xi>  X2i  •  •  •  >  Xm 
points  on  the  interval  [a,  6],  fi\,  p2i  •  •  •  ■> positive  real  coefficients,  and  <5  the  Dirac  5-function 
on  It. 

Remark  2.2  Obviously,  (8)  defines  to  to  be  a  linear  combination  of  a  non-negative  function 
with  a  finite  collection  of  8 -functions  with  positive  coefficients.  In  a  mild  abuse  of  terminology, 
throughout  this  paper,  we  will  be  referring  to  to  as  a  non-negative  function. 

Quadratures  are  typically  chosen  so  that  the  quadrature  (6)  is  equal  to  the  desired  integral 
(7)  for  some  set  of  functions,  commonly  polynomials  of  some  fixed  order.  Of  these,  the  classical 
Gaussian  quadrature  rules  consist  of  n  nodes  and  integrate  polynomials  of  order  2n  1  exactly. 
In  [10],  the  notion  of  a  Gaussian  quadrature  was  generalized  as  follows: 

Definition  2.3  A  quadrature  formula  will  be  referred  to  as  Gaussian  with  respect  to  a  set  of 

2n  functions  <f> i, _ ,  <fan  ■  [o,  b]  —¥  lit  and  a  weight  function  to  :  [a,  b]  >  Ht"*-,  if  it  consists  of 

n  weights  and  nodes,  and  integrates  the  functions  fa  exactly  with  the  weight  function  to  for  all 
i  =  1, . . . ,  2n.  The  weights  and  nodes  of  a  Gaussian  quadrature  will  be  referred  to  as  Gaussian 
weights  and  nodes  respectively. 

The  following  theorem  appears  to  be  due  to  Markov  [11,  12];  proofs  of  it  can  also  be  found 
in  [8]  and  [7]  (in  a  somewhat  different  form). 

Theorem  2.1  Suppose  that  the  functions  fa, . . . ,  fan  ■  [a,  b]  -*  R  form  a  Chebyshev  system  on 
[a,  6],  Suppose  in  addition  that  to  :  [a,  6]  — >  R.  is  defined  by  (8),  and  that  either 

rb 

/  Co(x)dx  >  0,  (9) 

J  a 

or  m  >  n  (or  both).  Then  there  exists  a  unique  Gaussian  quadrature  for  fa,. fan  on  [a,  6] 
with  respect  to  the  weight  function  to.  The  weights  of  this  quadrature  are  positive. 

2.3  Quadrature  and  Interpolation 

As  is  well-known,  when  Gaussian  nodes  on  the  interval  [—1, 1]  are  used  for  interpolation  (for 
example,  via  the  Lagrange  formula),  the  resulting  procedure  is  numerically  stable.  Furthermore, 
the  precision  obtained  via  Gaussian  (Lagrange)  interpolation  is  almost  as  high  as  that  obtained 
via  Chebyshev  interpolation  (see,  for  example,  [4]).  Generally,  given  a  weight  function  to,  the 
nodes  of  Gaussian  quadratures  corresponding  to  to  lead  to  interpolation  formulae  that  are  stable 
in  an  appropriately  chosen  norm.  In  this  subsection,  we  formalize  this  fact  for  both  Gaussian 
and  many  Generalized  Gaussian  quadratures.  The  analytical  tool  of  this  subsection  is  the 
following  obvious  theorem. 
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Theorem  2.2  Suppose  that  the  function  u>  :  [a,b]  R  is  non-negative,  and  the  functions 
4>i,  <h,  ■  ■  •  1 4>n  •  [a,  6]  — y  R.  are  orthonormal  with  respect  to  the  weight  function  u,  i.e. 

f  u{x)  •  4>j{x)  •  4>i(x)dx  =  6ij  (10) 

Ja 

for  all  i,j  =  1,2, ...,n  (6ij  denotes  Kroneker’s  8-function).  Suppose  further  that  the  n-point 
quadrature  rule  X\,X2,.  •  ■  ,xn,  w\,u)2,  •  •  ■  ,u>n,  is  such  that  Wi  >  0  for  all  1  <  t  <  n.  Finally, 
suppose  that 

^Wk-fa (z/c)  •  <t>j (**)  =tij  ( 1 1 ) 

k= 1 

for  all  i,  j  =  1, 2, . . .  ,n.  Then  the  n  x  n-matrix  A  defined  by  the  formula 

Aij  =  y/uij  •  (^) 


is  orthogonal. 

Suppose  now  that  we  would  like  to  construct  an  interpolation  formula  on  the  interval  [a,  b] 
for  functions  of  the  form  n 

f(x)  =  '%2ai-<t>i(x),  (13) 

i=l 

with  ai,a2, . . . ,a„  arbitrary  real  coefficients.  In  other  words,  suppose  that  we  are  given  the 
values  /i,  fit  •  •  •  ■,  fn  of  a  function  /  at  a  collection  of  points  X\,X2, . . . ,  xn,  and  that  it  is  known 
that  /  is  defined  by  the  formula  (13),  but  the  coefficients  ai,a2, . . .  ,an  are  not  known;  we 
would  like  to  be  able  to  evaluate  /  at  arbitrary  points  on  [o,  6].  The  obvious  way  to  do  so  is  to 
observe  that  the  values  /i,  /2,  •  •  • ,  /n  are  linear  functions  of  the  coefficients  f*i,a2,  ...,an  (due 
to  (13));  evaluating  (13)  at  the  points  xi,x2, . . .  we  obtain  the  system  of  equations 

fj  =  (14) 

*=i 

with  j  =  1,2,..., n.  Defining  the  n  x  n-matrix  B  by  the  formula 

bj,i  =  <t>i{xj),  (15) 


we  rewrite  (14)  in  the  form 


F  =  Ba, 


(16) 


with  the  vectors  a,  F  €  Rn  defined  by  the  formulae 

a  =  (ai,a2,...,a„),  (17) 

F  =  {fi,  fa,  •  •  •  >  fn)-  (18) 

Now,  as  long  as  the  matrix  B  is  non-singular,  we  can  evaluate  the  coefficients  01,02, •••  i«n 
via  the  formula 

o  =  B~lF,  (19) 

and  use  (13)  to  evaluate  /  at  arbitrary  points  on  [0,6].  Of  course,  in  actual  numerical  calcula¬ 
tions,  it  is  not  sufficient  for  B  to  be  invertible;  its  condition  number  must  not  be  too  high.  The 
following  observation  is  the  principal  purpose  of  this  subsection. 
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Observation  2.3  Under  the  conditions  of  Theorem  2.2, 

A  =  Do  B,  (20) 

with  D  the  diagonal  matrix  defined  by  the  formula 

Di,i  =  y/uii,  (21) 

and 

a  =  A*DF  (22) 

(due  to  the  combination  of  (19)  with  (20)).  In  other  words,  given  the  table  of  values  /i,  /2,  •  •  •  i  fn 
of  the  function  f  at  the  nodes  xux2,...,  xn,  one  obtains  the  coefficients  of  the  expansion  (IS)  by 
applying  to  the  vector  F  the  product  of  two  matrices;  the  first  of  these  matrices  is  orthogonal,  and 
the  second  is  diagonal;  the  diagonal  elements  of  the  latter  are  square  roots  of  (positive)  weights 
of  the  n-point  quadrature  formula  exact  for  all  pairwise  products  of  the  functions  <t>i,<hw  An- 

Remark  2.4  While  at  first  glance  the  above  observation  appears  to  be  very  limited  in  its  scope 
(since  it  relies  on  the  quadrature  formula  being  exact  for  all  pairwise  products  of  the  functions 
fa,  fa, . . . ,  <f>n),  in  reality  it  means  that  whenever  the  nodes  of  a  Generalized  Gaussian  quadrature 
formula  are  used  as  interpolation  nodes,  the  resulting  interpolation  formula  tends  to  be  stable. 
The  reason  for  this  happy  coincidence  is  the  fact  that  the  matrix  A  (see  (12)  above)  )  need  not 
be  orthogonal  for  the  stability  of  the  interpolation  formula;  it  only  needs  to  be  well- conditioned. 
Thus,  as  long  as  the  quadrature  formula  is  reasonably  accurate  for  all  pairwise  products  of 
the  functions  fa,  fa,  ■ -  -An,  the  matrix  A  is  close  to  being  orthogonal;  therefore,  the  condition 
number  of  A  is  close  to  unity,  and  the  interpolation  based  on  the  nodes  x\,x2,  ■  ■  ■  ,xn  is  stable. 

2.4  Convergence  of  Newton’s  method 

In  this  section,  we  observe  that  the  nodes  and  the  weights  of  a  Gaussian  quadrature  satisfy  a 
simple  system  of  nonlinear  equations.  We  then  prove  that  the  Newton  method  for  this  system  of 
equations  is  always  quadratically  convergent,  provided  the  functions  to  be  integrated  constitute 


an  extended  Chebyshev  system. 

Given  a  set  of  functions  .,fan  and  a  weight  function  ui,  the  Gaussian  quadrature  is 

defined  by  the  system  of  equations 

ii 

jr 

S' 

*WS 

rb 

/  <f>i(x)  ■  u}{x)dx, 

Ja 

n 

Y)wj-Mxj)  = 
i= i 

rb 

/  faix)  ■  u{x)dx, 

Ja 

'EM* 

£ 

ll 

fb 

/  <hn(x)  -U}{x)dx, 

Ja 

(23) 
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(see  Definition  2.3).  Let  the  left  hand  sides  of  these  equations  be  denoted  by  f\  through  /2„. 
Then  each  fi  is  a  function  of  the  weights  w\,. . . ,  wn  and  nodes  xx , . . . ,  xn  of  the  quadrature. 
Its  partial  derivatives  are  given  by  the  obvious  formulae 


=  foM, 

OWi 

(24) 

=  wi  ■  4>’k{xi)- 

(25) 

Thus,  the  Jacobian  matrix  of  the  system  (23)  is 

/  •••  <Msn) 

wi4>[(xi) 

•••  wn(f>[{xn)  \ 

J{xi,...,xn,wi,...,wn)  =  :  : 

1 

.  (26) 

V  <hn(xi)  •••  <hn(Xn) 

«01$n(*l) 

•••  wn4>'2n{xn)  ) 

Lemma  2.3  Suppose  that  the  functions  <t>\,...,<hn  form  an  extended  Chebyshev  system.  Let 
the  Gaussian  quadrature  for  these  functions  be  denoted  by  t2)*  and  £%  ■  Then  the  determi¬ 
nant  of  J  is  nonzero  at  the  point  which  constitutes  the  Gaussian  quadrature;  in  other  words, 


Proof.  It  is  immediately  obvious  from  (26)  that 


|J(xi,...,x„,u>i,...,u>„)|  = 

W\  ■  U)2 . t&n-l  •  Wn  ' 


Mxi) 

fani&l) 


Mxn) 

<^2n(^n)  fan  (*l) 


<f>l  (®n) 


fan (®n) 


(27) 


If  fa, . . . ,  4>2n  form  an  extended  Chebyshev  system,  then  by  Theorem  2.1,  the  weights  wu . . . ,  wn 
of  the  Gaussian  quadrature  are  positive.  In  addition,  by  the  definition  of  an  extended  Chebyshev 
system,  the  determinant  in  the  right  hand  side  of  (27)  is  nonzero.  Thus 

|J(ii,...,x„,u>i,...,u>„)|  #0.  (28) 


□ 


Using  the  inverse  function  theorem,  we  immediately  obtain  the  following  corollary: 

Corollary  2.4  Under  the  conditions  of  Lemma  2.3,  the  Gaussian  weights  and  nodes  depend 
continuously  on  the  weight  function. 


2.5  Singular  value  decomposition 

The  singular  value  decomposition  (SVD)  is  a  ubiquitous  tool  in  numerical  analysis,  given  for 
the  case  of  real  matrices  by  the  following  lemma  (see,  for  instance,  [13]  for  more  details). 

Lemma  2.5  For  any  nxm  real  matrix  A,  there  exist,  for  some  integer  p,  annxp  real  matrix 
U  with  orthonormal  columns,  an  mx  p  real  matrix  V  with  orthonormal  columns,  and  apxp 
real  diagonal  matrix  S  —  [sy  ]  whose  diagonal  entries  are  non-negative,  such  that  A  =  U  •  S  •  V* 
and  that  su  >  Sj+iti+i  for  all  i  =  1, . . .  ,p  —  1. 

The  diagonal  entries  su  of  S  are  called  singular  values;  the  columns  of  the  matrix  V  are 
called  right  singular  vectors;  the  columns  of  the  matrix  U  are  called  left  singular  vectors. 
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2.6  Singular  value  decomposition  of  a  sequence  of  functions 

A  similar  decomposition  exists  (see  [5,  16])  if  the  columns  of  the  matrix  A  are  replaced  by 
functions: 

Theorem  2.6  Suppose  that  the  functions  ■  [a, 6]  ->  K-  are  square  integrable. 

Then  there  exist  a  finite  orthonormal  sequence  of  functions  ui,  t»2> . . . ,  Up  :  [o,  b]  -¥  R,  an  n  x  p 
matrix  V  —  [vy]  with  orthonormal  columns,  and  a  sequence  Si  >  S2  >  •  •  •  >  sP  >  0  €  ft,  for 
some  integer  p,  such  that 

<f>j{x)  -  Y^u (29) 

i=X 

for  all  x  £  [a,  6]  and  all  j  =  1, ,  n.  The  sequence  {sj}  is  uniquely  determined  by  K. 

By  analogy  to  the  finite-dimensional  case,  we  refer  to  this  factorization  as  the  singular  value 
decomposition.  We  refer  to  the  functions  {u;}  as  singular  functions,  to  the  columns  of  the 
matrix  V  as  singular  vectors,  and  to  the  numbers  {s»}  as  singular  values. 

A  popular  application  of  the  Singular  Value  Decomposition  is  for  the  purpose  of  “compress¬ 
ing”  data.  Specifically,  it  often  happens  that  while  the  total  number  n  of  functions  is  large, 
almost  all  of  the  coefficients  Sj  in  the  decomposition  (29)  are  negligibly  small.  In  such  cases, 
(29)  is  truncated  after  a  small  number  (say,  po)  of  terms,  and  the  resulting  expansion 

Po 

=  ]T  Ui(x)  ■  Si  •  Vij  (30) 

»=i 

is  viewed  as  a  compact  representation  of  the  original  family  of  functions  </>i,  <fo.  •  •  • ,  4>n- 

The  following  theorem  states  that  given  a  sequence  of  functions  on  the  interval  [a,  b],  their 
decomposition  of  the  form  (30),  and  a  quadrature  formula  with  positive  weights  on  the  interval 
[a,  6],  the  accuracy  of  the  quadrature  for  the  functions  <j>i,  $2,  •  •  • ,  <f>n  is  determined  by  its  accu¬ 
racy  for  the  singular  functions  uj,  corresponding  to  non- trivial  singular  values.  Its  proof  is  an 
exercise  in  elementary  linear  algebra,  and  is  omitted. 


Theorem  2.7  Suppose  that  under  the  conditions  of  Theorem  2.6,  e  is  a  positive  real  number, 
1  <  po  <  n  is  an  integer,  and 

i  *?  <  t-  <3i> 


t=PO+l 


Suppose  further  that  the  m-point  quadrature  formula  integrates  the  functions  Ui  exactly , 


m  -6 

J^Wj-Ui(xj)  =  /  «i(x)  dx 

i  Ja, 


for  all  i  =  1,2,...  ,po,  and  that  all  of  the  weights  w\, . .  .,wm  are  positive.  Then  for  each 
i  =  1,2, . . .  ,n, 


m  fb 

Wj  •  4>i{xj)  -  /  <f>i{x)  dx  <€■  \\<f>i\\L2. 

•  i  Ja 
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3  Numerical  Apparatus 

3.1  Continuation  method 

In  order  for  Newton’s  method  to  converge,  the  starting  point  provided  to  it  must  be  close  to  the 
desired  solution.  One  scheme  for  generating  such  starting  points  is  the  continuation  method, 
described  below. 

Suppose  that  in  addition  to  the  function  F  :  R"  -+  R"  whose  zero  is  to  be  found,  another 
function  G  :  [0, 1]  x  R"  -»•  R"  is  available  which  possesses  the  following  properties: 

•  1.  For  any  x  6  It", 

G(l,x)=F{x).  (34) 

•  2.  The  solution  of  the  equation 

G(0,  x)  =  0  (35) 

is  known. 

•  3.  For  all  t  G  [0, 1],  the  equation 

G(t,x)  =  0  (36) 

has  a  unique  solution  x  at  which  the  conditions  for  Newton’s  method  to  converge  are 
satisfied. 

•  4.  The  solution  a;  is  a  continuous  function  of  t. 

If  these  conditions  are  met,  an  algorithm  for  the  solution  of  the  equation 

F{x)  =  0  (37) 

is  as  follows.  Let  the  points  ti,  for  t  =  1, . . . ,  m,  be  defined  by  the  formula  £,•  =  i/m.  Solve  in 
succession  the  equations 

G(tux)  =  0, 

G{t2,x)  =  0, 

G(tm,x)  =  0  (38) 

using  Newton’s  method,  with  the  starting  point  for  Newton’s  method  for  each  equation  taken 
to  be  the  solution  of  the  preceding  equation.  Due  to  (34),  the  solution  x  of  the  final  equation 
G{tm,x)  =  0  is  identical  to  the  solution  of  (37);  obviously,  for  sufficiently  large  m,  Newton’s 
method  is  guaranteed  to  converge  at  each  step. 

Remark  3.1  In  practice,  it  is  desirable  to  choose  the  smallest  m  for  which  the  above  algorithm 
will  work,  in  order  to  reduce  the  computational  cost  of  the  scheme.  On  the  other  hand,  the  largest 
step  (U  —  ti_i)  for  which  the  Newton  method  will  converge  commonly  varies  as  a  function  oft. 
Thus  the  algorithm  described  in  this  paper  uses  an  adaptive  version  of  the  scheme. 
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3.2  Continuation  scheme 

The  continuation  scheme  used  is  as  follows.  Let  the  weight  functions  cv  :  [0, 1]  x  [a,  b]  — ►  be 

defined  by  the  formula 

n 

oj(a,x)  =  awi(x)  +  (1  —  a)  ^<f(x  —  Cj),  (39) 

i= i 

where  u>i  is  the  weight  function  for  which  a  Gaussian  quadrature  is  desired,  <5  denotes  the  Dirac 
delta  function,  and  the  points  Cj  €  [a,  6]  are  arbitrary  distinct  points.  These  weight  functions 
have  the  following  properties: 

•  1.  With  a  =  1,  the  weight  function  is  equal  to  the  desired  weight  function  u>i,  due  to 
(39). 

•  2.  With  a  =  0,  the  Gaussian  weights  and  nodes  are 

wj  =  1,  (40) 

Xj  =  Cj,  (41) 

for  j  =  1, . . .  ,n,  whatever  the  functions  <pi  are  (since  w(0,x)  =  0,  unless  x  =  Cj  for  some 
j  e  [1  ,n]). 

•  3.  The  quadrature  weights  and  nodes  depend  continuously  on  a  (by  Corollary  2.4). 

The  intermediate  problems  which  the  continuation  method  solves  are  the  Gaussian  quadratures 
relative  to  the  weight  functions  u>(a,  *).  The  scheme  starts  by  setting  a  =  0,  then  increases  a 
in  an  adaptive  manner  until  a  =  1,  as  follows.  A  current  step  size  is  maintained,  by  which  a 
is  incremented  after  each  successful  termination  of  Newton’s  method.  After  each  unsuccessful 
termination  of  Newton’s  method,  the  step  size  is  halved  and  the  algorithm  restarts  from  the 
point  yielded  by  the  last  successful  termination.  After  a  certain  number  of  successful  steps,  the 
current  step  size  is  doubled.  (Experimentally,  the  current  problem  was  found  to  be  well  suited 
to  an  aggressive  mode  of  adaption:  in  the  authors’  implementation,  the  initial  value  of  the  step 
size  was  chosen  to  be  0.5,  and  the  step  size  was  doubled  after  two  successful  terminations  of 
Newton’s  method.) 

3.3  Starting  points 

The  choice  of  the  points  cj  was  left  indefinite  above.  In  exact  arithmetic,  and  applied  to  a 
Chebyshev  system,  the  algorithm  would  converge  for  any  choice  of  distinct  points  (see  Lemma 
2.3).  However,  the  number  of  steps  of  the  continuation  method,  and  thus  the  speed  of  execution, 
is  affected  by  the  choice.  More  importantly,  the  numerical  stability  of  the  scheme  might  be 
compromised  due  to  poor  conditioning  of  the  matrix  J  (see  (26)).  Indeed,  while  Lemma  2.3 
guarantees  that  the  matrix  J  is  non-singular,  it  says  nothing  about  its  condition  number.  In 
addition,  we  will  be  applying  the  algorithm  to  cases  where  the  conditions  of  Lemma  2.3  are  not 
satisfied.  For  these  reasons,  the  following  method  of  choosing  the  starting  points  was  adopted. 
The  method  seeks  to  create  a  matrix  J  that  is  well-conditioned.  It  is  a  pivoted  Gram-Schmidt 
orthogonalization,  altered  to  operate  on  pairs  of  vectors: 
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.  1.  Choose  a  set  of  points  xux2,...,xm  on  the  interval  of  integration  [a,  6],  such  that  each 
of  the  functions  <f>i,  fa, .  - . ,  fa,  and  each  of  their  derivatives,  can  be  interpolated  on  [a,  b\ 
in  a  well-conditioned  manner  from  values  at  these  points. 

•  2.  Create  a  matrix  J,  of  the  same  form  as  (26),  where  the  points  {x^}  which  determine 
the  columns  are  the  points  chosen  in  step  1.  (This  matrix  thus  has  2m  columns.) 

•  3.  Perform  the  following  sequence  of  operations  n  times: 

-  a)  choose  the  point  Xj  for  which  the  two  columns  corresponding  to  Xj  have  the  largest 
size.  (The  issue  of  what  “size” to  use  is  discussed  below.) 

-  b)  orthogonalize  the  remaining  columns  to  both  of  those  two  columns. 

The  points  Xj  chosen  in  step  (3a)  are  then  the  starting  points  Cj  used  in  the  continuation 

method.  . ,  ,  , 

The  algorithm  as  specified  above  is  for  exact  arithmetic.  As  with  Gram-Schmidt,  the  al- 

gorithm  is  numerically  unstable,  but  can  be  stabilized  by  an  additional  re-orthogonalization: 
after  step  (3a),  re-orthogonalize  the  two  new  pivot  columns  to  all  of  the  previously  chosen  pivot 

columns. 

Remark  3.2  The  “ size  of  two  columns  ”  that  was  used  for  step  (3a)  is  the  sum  of  the  norms  of 
the  columns,  after  the  second  column  has  been  orthogonalized  to  the  first.  This  poses  the  obvious 
danger  that  one  of  the  two  columns  chosen  might  have  a  small  norm,  which  was  covered  up  by 
a  large  norm  of  its  companion.  This  would  render  it  unsuitable  for  pivoting;  this  danger  was 
never  realized  in  our  numerical  experiments,  but  if  it  were,  the  obvious  remedy  would  be  to 
attempt  to  change  the  definition  of  the  “size”.  The  authors  have  not  investigated  this  issue  in 

detail. 

3.4  Nested  Legendre  discretizations  of  finite  sequences  of  functions 

In  this  paper,  we  will  be  confronted  with  finite  sequences  of  functions  fa,  fa,...  fa  on  the 
interval  [a,  6],  possessing  the  following  properties: 

•  1.  The  total  number  n  of  functions  fa  is  reasonably  large  (e.g.  10000). 

•  2.  The  rank  of  the  set  fa,  fa,... fa,  is  low  (e.g.  40),  to  high  precision. 

•  3.  Each  of  the  functions  fa,  fa, . . .  fat  is  analytic  on  the  interval  [a,  6],  except  at  a  finite 
(small)  number  of  points;  fa  €  Ll[a,b]  for  all  t  =  1,2, ...  ,n. 

Now,  if  we  wish  to  handle  (interpolate,  integrate,  differentiate,  etc.)  numerically  functions 
of  the  form  „ 

fax)  =  fa’ 

t=l 

often  it  is  not  convenient  to  represent  them  by  collections  of  coefficients  £*1,0:2, ...an.  Indeed, 
if  the  functions  fa,  fa, . . .  fa  are  linearly  dependent,  the  number  of  coefficients  a*  necessary  to 
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represent  them  in  the  form  (42)  might  be  grossly  excessive,  compared  to  the  actual  complexity 
of  the  function  to  be  represented.  Furthermore,  the  coefficients  a*  by  themselves  provide  no 
mechanism  for  the  integration,  interpolation,  etc.  of  functions  of  the  form  (42);  each  time  such 
procedures  have  to  be  performed,  one  has  to  recompute  the  original  functions  <f> i,<fo,...$n. 
Since  the  latter  is  often  expensive  or  impossible,  it  is  desirable  to  have  a  purely  numerical  pro¬ 
cedure  for  representing  sums  of  the  form  (42).  Preferably,  the  scheme  should  use  no  information 
about  the  functions  <&,  except  for  their  values  at  a  finite  (preferably,  not  very  large)  collection 
of  points  on  [a,  6]. 

When  the  functions  <j>i  are  smooth,  a  widely  used  tool  for  representing  them  is  Chebyshev 
interpolation:  a  sufficiently  large  integer  m  is  chosen,  the  functions  <f>\ ,  ^2, . . .  <f>n  are  tabulated  at 
m  Chebyshev  nodes  on  [a,  6],  and  obtained  at  all  other  points  on  [a,  6]  via  standard  interpolation 
procedures.  While  Chebyshev  nodes  are  an  extremely  good  choice,  they  are  not  the  only 
one;  for  example,  Gaussian  (Legendre)  nodes  are  almost  as  efficient  as  the  Chebyshev  ones 
when  the  functions  are  to  be  interpolated,  and  twice  as  efficient  when  the  functions  are  to  be 
integrated  (see,  for  example,  [4]).  When  the  behavior  of  the  functions  fa  is  very  non-uniform 
over  the  interval  [a,  6],  Chebyshev  (Gaussian,  etc.)  interpolation  becomes  inefficient;  for  singular 
functions  it  is  liable  to  fail  completely.  In  such  cases,  adaptive  Chebyshev  interpolation  is  used, 
whereby  the  interval  is  subdivided  into  a  collection  of  subintervals,  so  that  on  each  subinterval, 
all  of  the  functions  fa  are  accurately  approximated  by  a  Chebyshev  expansion  of  low  order; 
needless  to  say,  most  of  the  time,  such  subdivisions  are  performed  automatically.  When  some 
(or  all)  of  the  functions  fa  have  singularities  on  the  interval  [a,  6],  schemes  of  this  type  cluster 
the  subintervals  near  each  singularity,  until  the  subinterval  nearest  to  the  singularity  is  so  small 
as  to  be  ignorable  for  the  purposes  of  the  calculations  to  be  performed. 

In  the  first  stage  of  the  algorithm  we  use,  we  build  a  nested  Chebyshev  discretization  of  the 
interval  [a,b]  for  each  of  the  functions  fa.  In  the  second  stage,  all  such  discretizations  are  merged 
to  obtain  a  single  discretization  by  which  all  of  the  functions  fa  are  adequately  represented.  In 
the  third  stage,  n  Legendre  nodes  are  constructed  on  each  of  the  obtained  intervals. 


Stage  1 


•  1.  Choose  the  precision  e  and  some  reasonably  large  m  (in  actual  computations,  we  use 
m  =  16). 


•  2.  Construct  the  m  Chebyshev  nodes  Xj0’^,  xj,0’^, . . .,  on  the  interval  [a,  6].  Evaluate 
<f>  at  the  nodes  x[a’^,  x^’b\  . . xln^,  obtaining  the  values  <f>^'b\  <$2>b\  •  •  4>m'b^- 


•  3.  Subdivide  the  interval  [a,  6]  into  the  subintervals  [a,  (a +  6) /2],  [(a  +  6)/2, 6].  Construct 
the  Chebyshev  nodes  x[a’^a+6^ ,  . . .,  1[£>(a+6)/2]  on  the  interval  [a,  (a  +  6)/2], 

and  the  Chebyshev  nodes  x[^ 0+t^2’^ ,  xjfa+6^2’^, . . .,  a.Kia+6)/2>6]  on  the  interval  [(a+6)/2, b]. 
Evaluate  the  function  <f>  at  the  nodes  x[a^a+t^2^,  ...,  Sto^  ^  Xj  ^ 

xf+b)l2'b\  ...,  X^a+fc)/2’i],  obtaining  the  values  ^a’(a+6)/2],  $'{a+b)/2\  ^’(a+6)/2], 

<j>[(a+l,),2Mi  respectively. 


[0,6] 


rMl 


4.  Interpolate  the  values  of  the  function  <f>  from  the  nodes  Xj“’6^,  x\ 
interval  [a,  b]  to  the  nodes  x[“’(a+6)/2],  x'a’(a+6)/2),  ...,  x£'(a+6)/2],  x^a+b)/2’b\  xf+b)/2'b\ 


Xm",  on  the 
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a4a+6)/2’6]  on  the  intervals  [a, (a  +  6)/2],  [(a  +  6)/2, 6J.  If  the  interpolated  values 

. ,  .  .  ...  .,  ,  ,[o,(o+fr)/2]  ,[a,(a+6)/2]  ,[o,(a+6)/2]  ,[(a+6)/2,6] 

agree  to  the  precision  e  with  the  values  <f>[  ,  <p2  \  . . . ,  <pm  ,  4>\  > 

^f(a+6)/2,6]  ^  ^  ^[(a+6)/2,t]  ca]cuiated  Erectly  in  Step  2  above,  the  algorithm  concludes 

that  the  function  <f>  is  adequately  resolved  by  the  m  Chebyshev  nodes  on  the  interval  [a,  6]; 
otherwise,  the  procedure  is  repeated  recursively  for  each  of  the  subintervals  [a,  (a  +  b)/ 2], 
[(a  +  b)/2,b). 


Stage  2 

•  1.  Store  the  ends  (left  and  right)  of  all  subintervals  in  all  subdivisions  in  a  single  array 
a.  Sort  the  elements  of  o;  remove  multiple  elements  in  a.  The  resulting  array  of  points 
on  the  interval  [a,  6]  (including  the  points  a,  b)  is  the  array  of  ends  of  subintervals  of  the 
final  subdivision. 


Stage  3 

•  1.  Construct  an  m-point  Legendre  discretization  of  each  of  the  subintervals  obtained  in 
Stage  2  above. 

Remark  3.3  In  the  algorithm  above,  we  use  Chebyshev  discretizations  in  Stage  1  to  construct 
the  subdivision  of  the  interval  [a,  6];  in  subsequent  calculations  we  use  Legendre  discretizations. 
The  reason  for  this  choice  is  that  the  interpolations  in  Stage  1  are  carried  out  more  efficiently 
with  Chebyshev  discretizations,  via  the  Discrete  Cosine  Transform  and  related  tools;  the  Leg¬ 
endre  discretizations  used  subsequently  lead  to  linear  interpolation  schemes  that  preserve  inner 
products  (see  following  subsection). 

Remark  3.4  The  scheme  of  this  subsection  is  a  fairly  reliable  apparatus  for  the  automatic 
discretization  of  sets  of  (more  or  less)  arbitrary  user-specified  functions.  While  it  is  very  easy 
to  construct  counterexamples  in  which  the  algorithm  will  fail  to  resolve  some  (or  all)  of  the 
input  functions,  this  problem  has  never  been  encountered  in  our  practice. 

3.5  Approximation  of  SVD  of  a  sequence  of  functions 

This  section  describes  a  numerical  procedure  for  computing  an  approximation  to  the  singular 
value  decomposition  of  a  sequence  of  functions.  The  algorithm  uses  quadratures  possessing  the 
following  property. 

Definition  3.1  We  will  say  that  the  combination  of  a  quadrature  and  an  interpolation  scheme 
preserves  inner  products  on  an  interval  [a,  b ]  if  it  possesses  the  following  properties. 

•  1.  The  nodes  of  the  quadrature  are  identical  to  the  nodes  of  the  interpolation  scheme. 

•  2.  The  function  which  is  output  by  the  interpolation  scheme  depends  in  a  linear  fashion 
on  the  values  input  to  the  interpolation  scheme. 
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•  3.  The  quadrature  integrates  exactly  any  product  of  two  interpolated  functions;  that  is, 
for  any  two  functions  f,g  :  [a,  6]  -4  R  produced  by  the  interpolation  scheme,  the  integral 

[  f(x)  •  g(x)dx  (43) 


is  computed  exactly  by  the  quadrature. 

Quadratures  and  interpolation  schemes  possessing  this  property  include. 

Example  3.1  The  combination  of  a  (classical)  Gaussian  quadrature  at  Legendre  nodes  and 
polynomial  interpolation  at  the  same  nodes  preserves  inner  products,  since  polynomial  interpo¬ 
lation  on  n  nodes  produces  an  interpolating  polynomial  of  order  n  -  1,  the  product  of  two  such 
polynomials  is  a  polynomial  of  order  2 n  -  2,  and  a  Gaussian  quadrature  integrates  exactly  all 
polynomials  up  to  order  2n  —  1. 

Example  3.2  If  an  interval  is  broken  into  several  subintervals,  and  a  quadrature  and  inter¬ 
polation  scheme  preserving  inner  products  is  used  on  each  subinterval,  then  the  arrangement 
as  a  whole  preserves  inner  products  on  the  original  interval.  (This  follows  directly  from  the 
definition.) 

Example  3.3  The  combination  of  the  trapezoidal  rule  on  the  interval  [0,27r],  and  Fourier  in¬ 
terpolation  (using  the  interpolation  functions  1,  cos  x, sinx,  cos  2x,  sin2x, . . . ,  cos  nx,  sinnx)  pre- 
serves  inner  products . 


The  algorithm  described  below  takes  as  input  a  sequence  of  functions  4>\,  <fc,  •  •  • ,  4>n  '•  K  &]  -4 
R.  It  uses  as  a  tool  a  quadrature  and  a  linear  interpolation  scheme  on  the  interval  [a,  b]  preserv¬ 
ing  inner  products;  the  weights  and  nodes  of  this  quadrature  will  be  denoted  by  wu . . . ,  wn  G  R 
and  xi, . . . ,  xn  6  [a,  b]  respectively.  As  will  be  shown  below,  the  accuracy  of  the  algorithm  is 
then  determined  by  the  accuracy  to  which  the  interpolation  scheme  approximates  the  functions 


<^lj  ^2)  •  •  •  i  tfin- 

The  output  of  the  algorithm  is  a  sequence  of  functions  ui, 
vectors  vi,...,vp  €  Rn,  and  a  sequence  of  singular  values  sj, 
mation  to  the  singular  value  decomposition  of  <f>\,  <f>2,  ■  ■  •  >  $n- 
Description  of  the  algorithm: 


,  Up  :  [a,  i>]  -4  R,  a  sequence  of 
,  sp  e  R,  forming  an  approxi- 


•  1.  Construct  the  n  x  m  matrix  A  —  [a^]  defined  by  the  formula 


aij  — 


(44) 


•  2.  Compute  the  singular  value  decomposition  of  A,  to  produce  the  factorization 

A  =  UoSoV*,  (45) 

where  U  =  [ujj]  is  an  n  x  p  matrix  with  orthonormal  columns,  V  =  [vy]  is  an  m  x  p 
matrix  with  orthonormal  columns,  and  S  is  a  p  x  p  diagonal  matrix  whose  j  th  diagonal 
entry  is  Sj. 
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•  3.  Construct  the  nxp  values  Uk{%i)  defined  by 

Uk(Xi)  =  Uik/y/wi.  (46) 

•  4.  For  any  desired  point  x  €  [a,  6],  evaluate  the  functions  uk  ■  [o,£>]  R  using  the 
interpolation  scheme  on  [a,  6]. 

The  proof  of  the  following  theorem  can  be  found  (in  a  considerably  more  general  form)  in 

[15]. 

Theorem  3.1  Suppose  that  the  combination  of  the  quadrature  and  interpolation  scheme  with 
weights  and  nodes  wi,...,wneR  and  xi,...,xn  E  [a,  b],  respectively,  preserves  inner  products 
on  [a,  6] .  For  any  function  K  :  [a,  6]  x  [c,  d]  ->  R,  let  Uj  :  [a,  6]  — ►  R,  Vy  £  R,  and  Sj  G  R  be 
defined  in  (44)-(46)>  for  all  i  =  Then 

•  1.  The  functions  Uj  are  orthonormal,  i.e. 

rb 

/  Ui{x)uk(x)dx  =  Sik  (47) 

Ja 

for  all  i,k  =  1, . . .  ,p,  with  Sik  the  Kronecker  symbol. 

•  2.  The  columns  of  V  are  orthonormal,  i.e. 

n 

53  vijvkjdx  =  8ik  (48) 

j'=i 

for  all  i,  k  =  1, . . .  ,p. 

•  5.  The  sequence  of  functions  <£i,  02, . . . ,  <^n  :  [a,  b]  — >  R  defined  by 

v 

<t>k(x)  =  ^2sjUj(x)vjk,  (49) 

;= i 

is  identical  to  the  sequence  of  functions  produced  by  sampling  the  functions  <f>\,<f>2,...,<pn 
at  the  points  {x*},  then  interpolating  with  the  interpolation  scheme  on  [a,  6]. 


4  Numerical  Algorithm 

This  section  describes  a  numerical  algorithm  for  the  evaluation  of  nodes  and  weights  of  gener¬ 
alized  Gaussian  quadratures.  The  algorithm’s  input  are  a  sequence  of  functions  <f> i, . . .  ,<fcn  : 
[a,  6]  -4  R,  and  the  precision  e  to  which  the  quadratures  are  to  be  calculated;  its  output  is  the 
weights  and  nodes  of  the  quadrature.  The  functions  fa  are  supplied  by  the  user  in  the  form  of 
a  subroutine,  with  input  parameters  (x,t),  and  output  parameter  fa{x).  The  algorithm  uses 
the  components  described  the  preceding  section. 

•  1.  The  interval  [a,  6]  is  discretized  via  the  scheme  described  in  Subsection  3.4,  so  that  all 
functions  fa,  fa, . . . ,  <pn  are  represented  to  the  precision  e. 
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•  2.  All  of  the  functions  (f> i,  <fc,...,  <fin  are  tabulated  at  the  nodes  of  the  discretization 
obtained  in  p.  1  above,  and  the  Singular  Value  Decomposition  is  obtained  of  the  sequence 
of  functions  <f>i,  fa,  •  •  ■ ,  4>n  via  the  scheme  described  in  Subsection  3.5;  we  will  be  denoting 
the  obtained  singular  values  by  Ai,  A2, . .  •  • 

•  3.  Denoting  by  fc  the  positive  integer  number  such  that  A2.*+i  <  e  <  A2.jfc.-1,  we  observe 
that  any  quadrature  formulae  with  positive  coefficients  that  integrates  the  obtained  sin¬ 
gular  functions  u  1,  U2, . . .  U2  k  exactly,  will  integrate  all  of  the  functions  <j>  1,  <^*2,  -  -  - ,  <pn 
with  precision  e  (see  Theorem  2.7  in  Subsection  2.5).  The  remainder  of  the  algorithm  is 
devoted  to  constructing  a  fc-point  quadrature  formula  that  will  integrate  the  functions  ui, 
u2,  •  ■  •  u2 -jt  exactly. 

•  4.  The  scheme  of  Subsection  3.3  is  used  to  find  the  starting  nodes  z§, f°r  the 
continuation  process  of  Subsection  3.2. 

•  5.  An  adaptive  version  of  the  continuation  method  of  Subsection  3.2  is  used  to  obtain  the 
fc-point  quadrature  for  the  functions  «i,  i»2,  •  •  . ,  ti2  Jfc!  on  each  step,  the  Newton  algorithm 
described  in  Subsection  2.4  is  used  to  solve  the  system  (23)  defining  the  nodes  and  roots 
of  the  quadrature  formula. 

Remark  4.1  We  would  like  to  reiterate  that  the  quadrature  formulae  produced  by  the  procedure 
of  this  section  do  not  integrate  the  user-specified  functions  <j>  1,  (j> 2, . . . ,  4>n  exactly;  instead,  they 
produce  approximations  to  the  integrals.  Needless  to  say,  the  two  are  indistinguishable,  as  long 
as  the  chosen  precision  e  is  less  than  the  machine  precision. 


5  Numerical  examples 

A  variety  of  quadratures  were  generated  via  the  algorithm  of  this  paper;  several  of  these  are 
presented  below  to  illustrate  its  performance.  In  Examples  5.1,  5.2,  the  calculations  were 
performed  in  extended  precision  (Fortran  REAL*16)  arithmetic,  to  assure  full  double  precision 
in  the  obtained  result.  In  Example  5.3,  the  calculations  were  performed  in  double  precision, 
since  the  accuracy  of  the  quadrature  listed  in  Table  5  is  only  9  digits. 

Example  5.1  An  obvious  problem  of  interest  is  the  integration  on  an  interval  of  functions 
that  have  a  singularity  at  one  end  of  that  interval  (or  at  both  ends);  of  particular  interest  are 
power  and  logarithmic  singularities.  Many  techniques  have  been  proposed  for  dealing  with  such 
problems  (see,  for  example,  [1]).  While  some  of  these  approaches  are  quite  effective  for  some 
of  the  singularities,  they  have  the  drawback  that  each  of  them  only  deals  with  one  particular 
singularity.  In  this  example,  we  present  quadrature  rules  for  the  integration  of  functions  of  the 
form 

n  m 

X  (Tit  •  l°9(x)  +  X  Pkj  ‘  )  *  pk(x)  (50) 

k=0  j= 1 

where  Pjt  denotes  the  (normalized)  orthogonal  polynomial  of  order  fc  on  the  interval  [0, 1],  fikj, 
7jt  are  arbitrary  real  numbers,  and  ay  are  arbitrary  real  numbers  on  the  interval  [—0.6, 1]. 
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Table  1:  16-node  quadrature  for  functions  of  the  form  (50),  with  a  €  [-0.6,1],  N  —  4,  and 
precision  10-15 


Xi 

Wi 

0. 1646476245461994E- 1 8 
0.2004881 755033 198E-1 3 
0.4902407997203263E- 10 

0. 1396853977847601E-07 

0.9715236454504147E-06 

0.2502196135803993E-04 

0.3 120851 1496731 10E-03 

0.2264576 163994000E-02 

0.1086917746927712E-01 

0.3777218640280392E-01 

0.1013279037973986E-(-00 

0.2196196157836697E+00 

0.3972680999338400E+00 

0.6135562966157080E+00 

0.8216868417553706E+00 

0.9636466562372551E+00 

0.2477997131959177E-17 

0.1863311166024058E-12 

0.3215991324579055E-09 

0.6788563189534853E-07 

0.3586206403622012E-05 

0.7130636866829449E-04 

0.6951436010759356E-03 

0.3979838127986921E-02 

0.1515746778330600E-01 

0.4182483334409624E-01 

0.8854031057518543E-01 

0. 1490380907486389E+00 
0.202831253845101 1E+00 
0.2216836945000430E+00 
0. 18445674481 10479E+00 
0.9171766188102896E-01 

In  order  to  design  such  quadratures,  we  choose  a  reasonably  large  natural  m,  construct  m 
Legendre  nodes  ai,  02 ,  •  •  • ,  ci'm ,  on  the  interval  [ — 0.6, 1],  and  use  all  functions  of  the  forms 

Pk(x)  ■  xaj ,  (51) 

Pk(x) ■ log(x)  (52) 

as  input  functions  <f>i  for  the  algorithm  of  the  preceding  section.  The  result  is  a  set  quadratures 
for  functions  of  the  forms  (51),  (52).  A  somewhat  involved  analytical  calculation  shows  that 
for  sufficiently  large  m,  the  obtained  quadratures  will  work  for  all  functions  of  the  form  (50), 
and  our  numerical  experiments  show  that  m  —  100  insures  full  double  precision  accuracy  for 
all  aj  £  [-0.6, 1]. 

In  Tables  1  -  5,  we  list  quadrature  nodes  and  weights  for  n  =  4, 9, 19, 29.  In  Tables  1,  3, 
4,  5,  the  number  of  nodes  is  chosen  to  guarantee  15-digit  accuracy.  In  Table  2,  the  number  of 
nodes  is  chosen  to  guarantee  7  digits. 

Example  5.2  The  quadrature  rules  in  this  example  are  very  similar  to  those  in  Example  5.1, 
except  here  we  construct  quadrature  rules  for  functions  singular  at  both  ends  of  the  interval 
where  they  are  to  be  integrated.  Specifically,  integrands  have  the  form 

n  m 

y  (y{a,kj  ■  (1  +  x)aj  +  bkj  ■  (1  —  x)Qj )  +  c*;  •  log(l  +x)  +  dk-  log(  1  —  x))  •  Pk(x)  (53) 
fc=0  j= 1 
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Table  4:  26-node  quadrature  for  functions  of  the  form  (50),  with  a  €  [—0.6,1],  N  —  19,  and 
precision  10~15 


Xi 

Wi 

0.2852686209735951E-20 

0.4655349788609637E-15 

0. 1432147899313873E-1 1 
0.4915792345704672E-09 
0.3986884553883893E-07 
0.1168849078081257E-05 
0.1630549221175312E-04 
0.1307331567674635E-03 
0.6884061227847875E-03 
0.2620448293548410E-02 
0.7740029188833982E-02 

0. 1 872452403074940E-0 1 
0.3869460001276389E-01 
0.7058074961479 188E-0 1 

0. 1 165353335503884E+00 
0. 1775282580420220E+00 
0.2531447462199369E+00 
0.3415558481256653E+00 
0.4396281348394975E+00 
0.54314472781971 11E+00 
0.6471126706707170E+00 
0. 7461308 154896283E+00 
0.8347900655356778E+00 
0.90807599998824 1 1E+00 
0.9617441758037388E+00 
0.9926478556999123E+00 

0.4390385492743041E-19 
0.4445881 189691443E-14 
0.9689649973398580E-11 
0.2471 786670704959E-08 

0. 1527652265503579E-06 
0.3470933550491954E-05 
0.3803166416108812E-04 
0.2422240257088061E-03 
0.1022568448159836E-02 
0.3143745934305781E-02 
0.7549238041954824E-02 
0.1495112040361046E-01 
0.2548756008178511E-01 
0.3865021281644121E-01 
0.5342389042306681E-01 
0.6849323863305738E-01 
0.8243302008328313E-01 
0.9386320384208941E-01 
0.1015733726852001E+00 

0. 10462 14551363520E+00 

0. 10240749639633 1 1 E+00 
0.9472049436813551E-01 
0.8175595131244442E-01 
0.6410309004863602E-01 
0.4270384642243640E-0 1 
0.1881261305258270E-01 
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Table  5:  36-node  quadrature  for  functions  of  the  form  (50),  with  a  G  [-0.6, 1],  N  —  39,  and 
precision  10“ 15 


Xi 

Wi 

0.1174238417413926E-19 

0.1422439193737780E-14 

0. 3350676698582048E- 1 1 
0.8987762100979194E-09 
0.5804062676082615E-07 
0.1381879982602796E-05 
0.1599014834456195E-04 

0. 1086072834052024E-03 
0.4939690780979653E-03 
0.1653457719227906E-02 
0.4371083474213578E-02 
0.9635942477742897E-02 
0.1847241513238332E-01 
0.3179190367214565E-01 
0.5030636405050507E-01 

0. 74494428689523 1 9E-0 1 

0. 1045979502135202E+00 
0.1406326475828715E+00 
0.1824044449022998E+00 
0.2295280679235570E+00 
0.28 14468220422235E+00 
0.3374533767644982E+00 
0.3967116179369689E+00 
0.4582796041927400E+00 
0.521 1335571597729E+00 
0.5841926980689389E+00 
0.6463446423449487E+00 
0.7064709858680002E+00 
0.7634726623238107E+00 
0.8 162946 1 87294954E+00 
0.8639493438008133E+00 
0.9055387898384755E+00 
0.940274254235763 1 E+00 
0.9674938463383342E+00 
0.9866773942995437E+00 
0.9974613070359063E+00 

0.1 76904259638 1 234E- 1 8 
0.1318732300270049E-13 
0.2181187238172082E-10 
0.4306047388907762E-08 
0.2097251047066944E-06 
0.3830347070073085E-05 
0.3447814965093908E-04 
0.1843012333973045E-03 
0.6658876227138618E-03 
0.1785581170381193E-02 
0.3817614649487054E-02 
0.6885390581283880E-02 

0. 1094085630140653E-01 
0.1581728538518057E-01 
0.2129142636454853E-01 
0.2712481569656370E-01 
0.3308456773919071E-01 
0.3895216905306892E-01 
0.4452688339606666E-01 
0.4962723403902098E-0 1 
0.5409202 1691 30247E-01 
0.5778135262022458E-01 
0.6057773920656186E-01 
0.6238718893653459E-01 
0.6314016307782669E-01 
0.6279229386348975E-01 
0.6132477029316637E-01 
0.5874432665998542E-01 
0.5508279084756487E-01 
0.5039617034177984E-01 
0.4476327290202123E-01 
0.3828387702474601E-01 
0.3107648956468336E-01 
0.2327578565976658E-01 
0.1503024417658587E-01 
0.6508977351752366E-02 
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Table  6:  22-node  quadrature  for  functions  of  the  form  (53),  with  a  e  [—0.1,1],  N  —  4,  and 
precision  10~15 


±Xi 

Wi 

0. 1 666008 11931 6040E+00 
0.473646793756 1296E+00 
0.71 294639000 1 7805E+00 
0.8687173264995090E+00 
0.9515411665787298E+00 
0.9862971262509680E+00 
0.9972429072629104E+00 
0.9996464539418006E+00 
0.9999757993153293E+00 
0.9999993605804343E+00 
0.9999999970230195E+00 

0.3286464553329054E+00 
0.2782402062916909E+00 
0. 1977249261400840E+00 

0. 1 158087624474726E+00 
0.5425992604604305E-01 

0. 19438741 13675287E-01 
0.4979788483749470E-02 
0.8238003428108275E-03 
0.7462712208720397E-04 
0.2746237603563529E-05 
0.2041880191195951E-07 

where  P*  denotes  the  (normalized)  orthogonal  polynomial  of  order  k  on  the  interval  [—1,1], 
ak<v  bkj ,  cjfc,  <4  are  arbitrary  real  numbers,  and  a,  are  arbitrary  real  numbers  on  the  interval 
[—0.1,1].  Quadrature  nodes  and  weights  for  n  =  4,9,19,39  are  listed  in  Tables  6,  7,  8,  9 
respectively;  in  all  cases,  the  precision  is  10-15. 

Example  5.3  In  this  example,  we  construct  a  direct  generalization  of  quadratures  constructed 
in  Example  5.1,  permitting  the  integrands  to  have  power  and  logarithmic  singularities  at  ar¬ 
bitrary  points  on  the  closed  half-line  to  the  left  of  the  interval  of  integration.  Specifically, 
integrands  have  the  form 

n  m 

£(7 *  •  log{x  +  h)  +  Pkj  1  (*  +  h)aj )  •  ft  (*)  (54) 

k= 0  j= 1 

where  Pk  denotes  the  (normalized)  orthogonal  polynomial  of  order  k  on  the  interval  [0, 1],  /4j, 
7*  are  arbitrary  real  numbers,  ctj  are  arbitrary  real  numbers  on  the  interval  [—0.65, 1],  and  h 
is  an  arbitrary  positive  real  number.  In  this  case,  the  calculations  were  conducted  in  double 
precision;  the  38-node  quadrature  formula  for  n  =  19  is  listed  in  Table  10;  its  precision  is  10~9. 

Several  observations  can  be  made  from  the  tables  1-8,  and  from  the  more  detailed  numerical 
experiments  we  have  conducted. 

•  1.  The  algorithm  of  this  paper  is  always  effective  for  Chebyshev  systems;  it  almost  always 
works  for  non-Chebyshev  ones. 

•  2.  The  scheme  does  not  lose  very  many  digits  compared  to  the  machine  precision;  when 
the  calculations  are  performed  in  double  precision,  the  quadratures  can  be  obtained  to  11 
or  12  digits;  the  accuracy  of  quadratures  in  Tables  1-8  is  full  double  precision;  we  used 
extended  precision  arithmetic  in  FORTRAN  to  obtain  them. 
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Table  7:  27-node  quadrature  for  functions  of  the  form  (53),  with  a  E  [—0.1,1],  N  —  9,  and 
precision  10“ 15 


±Zi 

Wi 

O.OOOOOOOOOOOOOOOOE+OO 
0.1953889665467211E+00 
0.3814298736462841E+00 
0.5496484616443740E+00 
0.6932613279607421E+00 
0.8078808016610349E+00 
0.8920478424190657E+00 
0.9475053154471952E+00 
0.9790448975 739819E+00 
0.9936444652327659E+00 
0.998693638631 1707E+00 
0.9998477986092101E+00 
0.9999927156219827E+00 
0.9999999335937359E+00 

0. 1969765126094452E+00 
0. 19222871 1 1905558E+00 
0.1 784269782500965E+00 

0. 15686774853509 1 3E-I-00 

0. 1296 1 76364576521 E+00 
0.9937321489137896E-01 
0.69253 1 79 1783766 1E-0 1 
0.42473968 1 8782292E-0 1 
0.2179872525134398E-01 
0.8672220251831 163E-02 
0.2388475528070173E-02 
0.383764865376993 IE-03 
0.267142277754143 IE-04 
0.4068798910349743E-06 

Table  8:  33-node  quadrature  for  functions  of  the  form  (53),  with  a  €  [—0.1,1],  N  =  19,  and 
precision  10~15 


±rCi 

Wi 

O.OOOOOOOOOOOOOOOOE+OO 
0. 1789856568226836E+00 
0.3505713663705831E+00 
0.5079970396268890E+00 
0.6457344058749438E+00 
0.7599840782344723E+00 
0.8490304782768580E+00 
0.9134021329241244E+00 
0.955771 7316319267E+00 
0.9805181730564275E+00 
0.9929045523533901E+00 
0.9979798758935006E+00 
0.9995837651123616E+00 
0.9999445617386989E+00 
0.9999960165362139E+00 
0.9999998889650372E+00 
0.9999999994557687E+00 

0. 1802406542699465E+00 

0. 1764865559769247E+00 

0. 1 655482040246752E+00 

0. 1483733690643724E+00 

0. 126462095653522 1 E+00 
0.1017484935648103E+00 
0.764338617140883 1E1-0 1 
0.5276203409291 129E-01 
0.32720864268082 1 8E-0 1 
0.1766845539228831E-01 
0.7963812531655223E-02 
0.2833884283485953E-02 
0.7387521680930171E-03 

0. 1267394032662049E-03 
0.1207609748958691E-04 
0.4709227238502033E-06 
0.3706639850258617E-08 
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Table  9:  45-node  quadrature  for  functions  of  the  form  (53),  with  a  €  [—0.1,1],  N  —  39,  and 
precision  10“ 15 


±Ii 

Wi 

O.OOOOOOOOOOOOOOOOE+OO 
0.1135283181390291E+00 
0.2253080046824045E+00 
0.3336364252858657E+00 
0.43690240523569 1 1E+00 
0.533630670789 1807E+00 
0.6225248777667337E+00 
0.7025089656717720E+00 
0.77276671 18189729E+00 
0.8327794264993337E+00 
0.8823615451977041E+00 
0.9216930322777481E+00 
0.9513451962287941E+00 
0.9722913641056944E+00 
0.9858845322639776E+00 
0.9937724959340503E+00 
0.9977200386244100E+00 
0.9993454278943935E+00 
0.9998636273258416E+00 
0.9999815974719829E+00 
0.9999986596740707E+00 
0.9999999622133619E+00 
0.9999999998 137450E+00 

0.1138212938786054E+00 

0. 1 129431358863252E+00 
0.1103317059272695E+00 

0. 1060558645237672E+00 

0. 1002294986469973E+00 

0.9301028558331059E-01 

0.8459812566475355E-01 

0.7523338442881639E-01 

0.6519506433099722E-01 

0.5479889055074179E-01 

0.4439489209928996E-0 1 

0.3436308131973152E-01 

0.2510376733595393E-01 

0.1701539437521317E-01 

0. 1044852849794223E-01 
0.5626146436355554E-02 
0.2543352365327656E-02 
0.9118380718941661E-03 
0.2403706487446808E-03 
0.418 1929949775085E-04 
0.4045883666118617E-05 

0. 1599158044436823E-06 

0. 126829676771 1113E-08 
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Table  10:  38-node  quadrature  for  functions  of  the  form  (54),  with  a  (E  [—0.65, 1],  N  —  19,  and 
precision  10-9 


Xi 

wt 

0.7629165866352161E-18 
0.3799719398931375E-16 
0.5684549949701512E-15 
0.6085909916179373E-14 
0.5277191865393953E-13 
0.3900442913791902E-12 
0.2535538557277294E-11 
0.1481755662897140E-10 
0.7911595380511587E-10 
0.3907746000477183E-09 
0.1803070816493823E-08 
0.7833265344260583E-08 
0.3224897189563689E-07 
0.1264894823726299E-06 
0.4747932260937661 El-06 
0.171 1978528765632E1-05 
0.5948052018171647E1-05 

0. 1995877304286260E1-04 
0.6475274273537152E1-04 
0.2029004100170709E1-03 
0.6109309950274235E1-03 
0.1747449285439932E-02 
0.4661579935095226EJ-02 
0.1135932523990354E1-01 
0.2491532030262493E1-01 
0.4902801284057732E1-01 
0.8713816071641225E1-01 

0. 14155141 75271372E+00 
0.2128806314974303E+00 
0.2998564528132552E+00 
0.3994239415560721E+00 
0.50703138671 13639E+00 
0.617041 1438386144E+00 
0.7232121752054713E+00 
0.8192137516286219E+00 
0.8991333728333283E+00 
0.9579443204807173E+00 
0.9919093183441774E+00 

0.46439553332686 10E- 1 7 
0.1132690565299208E-15 
0.1423549582265871E-14 
0.1371876219104025E-13 
0.1094397021531007E-12 
0.7534990994077416E-12 
0.4603432835276850E-1 1 
0.2545533729683496E-10 

0. 1293022088581050E-09 
0.6102781198001779E-09 
0.2700678436986 190E-08 

0. 1 128792193586090E-07 

0.4482855569803782E-07 

0.1700035548631482E-06 

0.6182057321480894E-06 

0.2 163 1 08715557027E-05 

0.7302447810573277E-05 

0.2382492261847977E-04 

0.7511062044871306E-04 

0.2279609908900293E-03 

0.6592765068003472E-03 

0.1781666222619331E-02 

0.4378093849756735E-02 

0.9537600800370288E-02 

0.1820679046524441E-01 

0.3060746663786768E-01 

0.4600643316091537E-01 

0.6292513465068938E-01 

0.7951989233968431E-01 

0.9391761648476182E-01 

0.1044517799613406E+00 

0.1098153664961849E+00 

0.1091553255900476E+00 

0.1021230666276667E+00 

0.8888680524875885E-01 

0.7010796674100402E-01 

0.4688508195206744E-01 

0.2069742637648333E-01 
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•  3.  The  algorithm  of  this  paper  is  not  very  efficient.  For  example,  the  quadrature  formula 
in  Table  1  took  about  2  minutes  of  CPU  time  on  Ultra  SPARC  2;  the  quadrature  in  Table 
8  took  about  two  hours  of  CPU  time.  Of  course,  extended  precision  on  the  UltraSparc 
is  quite  inefficient;  in  double  precision,  Table  8  took  about  4  minutes  on  to  construct.  In 
any  event,  the  quadratures  of  the  type  presented  in  this  paper  need  not  be  constructed 
“on  the  fly”;  the  nodes  and  weights  can  be  precomputed  and  stored.  From  this  point 
of  view,  the  CPU  time  requirements  of  our  algorithm  are  not  excessive.  Still,  its  CPU 
time  requirements  grow  as  n3  for  large  n,  making  it  unsuitable  for  the  construction  of 
quadratures  of  very  high  order. 


6  Generalizations  and  Conclusions 

We  have  constructed  a  scheme  for  the  design  of  Generalized  Gaussian  Quadratures  for  a  fairly 
broad  classes  of  functions.  The  results  presented  here  should  be  viewed  as  somewhat  experi¬ 
mental,  since  while  the  algorithm  appears  to  work  under  quite  general  conditions,  we  can  only 
prove  that  it  has  to  work  for  Chebyshev  systems. 

Several  possible  extensions  of  the  work  suggest  themselves. 

1.  While  our  numerical  experiments  indicate  that  the  scheme  of  this  paper  works  under  very 
general  conditions,  we  have  only  been  able  to  prove  that  it  has  to  work  for  Chebyshev  systems 
(see  Subsection  2.1  above).  This  discrepancy  seems  to  indicate  that  it  might  be  profitable  to 
investigate  generalizations  of  Theorem  2.1  to  sets  of  functions  other  than  Chebyshev  systems. 

2.  By  combining  Observation  2.3  and  Remark  2.4  with  results  in  Sections  3,  4,  it  is  fairly 

straightforward  to  construct  algorithms  for  the  efficient  interpolation  of  fairly  large  classes  of 
singular  functions.  For  example,  the  nodes  -  •  • > x36  in  Table  5  lead  to  a  stable  interpola¬ 

tion  formula  on  the  interval  [0, 1]  for  all  functions  of  the  form 

n  m 

^2Pk(x)-Y^Pkj-x^,  (55) 

k=0  1 

with  —0.3  <  ctj  <  1,  0  <  A;  <  19,  and  the  precision  of  interpolation  10~15.  Interpolation 
schemes  of  this  type  are  currently  under  vigorous  investigation,  and  will  be  reported  in  the 
near  future. 

3.  In  many  situations  (especially,  in  the  numerical  solution  of  partial  differential  equations), 
it  is  desirable  to  have  “quadrature”  formulae  that,  in  addition  evaluating  integrals,  would 
evaluate  certain  pseudodifferential  operators,  i.e.  derivative,  Hilbert  Transform,  derivative  of 
the  Hilbert  Transform,  etc.  Clearly,  such  “quadratures”  can  not  have  positive  weights,  except 
for  the  Hilbert  Transform.  Several  such  quadratures  have  been  constructed  numerically,  and 
the  appropriate  theory  appears  to  be  fairly  straightforward;  this  work  will  be  reported  at  a 
later  date. 

4.  While  the  theory  of  Gaussian  Quadratures  in  one  dimension  is  extremely  simple  and  well- 
understood,  no  similar  theory  exists  in  higher  dimensions,  except  for  a  few  scattered  results 
(see,  for  example,  [9], [14]).  The  approach  of  this  paper  is  quite  different  from  the  classical 
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Gaussian  Quadratures,  and  it  appears  possible  to  generalize  it  (at  least,  formally)  to  higher 
dimensions.  While  the  advantages  of  such  a  construction  would  be  significant,  our  investigation 
of  it  is  at  a  very  early  stage.  If  successful,  it  will  be  reported  at  a  later  date. 
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Ever  since  its  introduction  in  the  1980’s,  the  Past  Multipole  Method  has  been  capable  of 
producing  very  high  accuracy  for  an  acceptable  cost  in  two  dimensions;  in  three  dimensions, 
it  has  been  considerably  less  efficient,  except  when  the  accuracy  requirements  were  low. 
This  situation  changed  somewhat  with  the  appearance  of  a  new  version  of  the  FMM  in  [12], 
which  is  highly  efficient  over  a  wide  range  of  accuracies.  That  paper  introduced  a  rather 
involved  mathematical  apparatus  and  described  the  algorithm  in  its  simplest,  non-adaptive 
form.  In  this  paper,  we  describe  an  adaptive  version  of  the  scheme  of  [12],  applicable  to  all 
distributions  of  particles  that  are  likely  to  be  encountered  in  practice.  The  performance  of 
the  algorithm  is  illustrated  with  numerical  examples. 
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H.  Cheng,  L.  Greengard  and  V.  Rokhlin 
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1  Introduction 


In  [12],  a  new  version  of  the  Fast  Multipole  Method  (FMM)  for  the  evaluation  of  potential 
fields  in  three  dimensions  was  introduced.  The  scheme  evaluates  all  pairwise  interactions  in 
large  ensembles  of  particles,  i.e.  expressions  of  the  form 


$(xj)  =  £ 


Qi 


tel 


\Xj  Xi\\ 


(i) 


for  the  gravitational  or  electrostatic  potential  and 


E(xj)  -  H9*' 

i=l 


Xj  -  gj 

I  \xj  -  z*ll3 


(2) 


for  the  field,  where  x\, xi,  •  •  • ,  xn  are  points  in  R3,  and  <ft,  $>,  •  •  • ,  qn  are  a  set  of  (real)  coeffi¬ 
cients. 

The  evaluation  of  expressions  of  the  form  (1)  is  closely  related  to  a  number  of  important 
problems  in  applied  mathematics,  physics,  chemistry,  and  biology.  These  include  molecu¬ 
lar  dynamics  and  quantum-mechanical  simulations  in  chemistry,  the  evolution  of  large-scale 
gravitational  systems  in  astrophysics,  capacitance  and  inductance  calculations  in  electrical  en¬ 
gineering,  and  incompressible  fluid  dynamics.  When  certain  closely  related  interactions  are 
considered  as  well,  involving  expressions  of  the  form 


iml 


||Xj— X*  || 

i*i-**r 


(3) 


the  list  of  applications  becomes  even  more  extensive. 

Ever  since  its  introduction  in  the  1980’s,  the  FMM  has  been  capable  of  producing  very  high 
accuracy  for  an  acceptable  cost  in  two  dimensions;  in  three  dimensions,  it  has  been  considerably 
less  efficient,  except  when  the  accuracy  requirements  were  low.  This  situation  changed  some¬ 
what  with  the  development  of  a  new  version  of  the  FMM  in  [12],  which  is  highly  efficient  over 
a  wide  range  of  accuracies.  That  paper  introduced  a  rather  involved  mathematical  apparatus 
and  described  the  algorithm  in  its  simplest,  non-adaptive  form. 
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Needless  to  say,  most  charge  distributions  encountered  in  applications  are  highly  non- 
uniform,  and  to  be  robust,  a  procedure  for  the  evaluation  of  sums  of  the  form  (1)  or  (2) 
has  to  be  adaptive.  In  this  paper,  we  introduce  such  a  scheme,  applicable  to  all  distributions  of 
particles  that  are  likely  to  be  encountered  in  practice.  An  additional  improvement  introduced 
in  this  paper  is  a  “compressed”  version  of  translation  operators  used  by  the  FMM  procedure, 
which  is  the  principal  reason  for  the  improvement  of  the  timings  found  in  Section  7  below  over 
those  in  [12]. 

The  paper  is  organized  as  follows.  In  Section  2,  we  summarize  the  mathematical  and  numer¬ 
ical  facts  to  be  used  in  subsequent  sections.  In  Section  3,  we  review  the  analytical  apparatus  to 
be  used  in  the  design  of  the  improved  version  of  the  FMM.  Section  4  recapitulates  the  algorithm 
of  [12],  to  be  used  as  the  starting  point  in  the  construction  of  the  scheme  of  this  paper.  In 
Section  5,  we  describe  the  adaptive  version  of  the  FMM  and  make  some  comparisons  with  tree 
codes.  In  Section  6,  we  illustrate  the  performance  of  the  method  with  several  numerical  exam¬ 
ples.  Finally,  Section  7  discusses  several  possible  generalizations.  For  a  review  of  FMM-type 
methods  and  a  more  thorough  discussion  of  the  literature,  we  refer  the  reader  to  [12]. 


2  Mathematical  preliminaries 


In  this  section,  we  review  the  analytical  tools  used  in  the  design  of  the  FMM  algorithm.  For  a 
detailed  discussion,  see  [15, 14,  21,  9,  12]. 

We  begin  by  defining  the  spherical  harmonics  of  degree  n  and  order  m  according  to  the 
formula 

(4) 

Here,  the  special  functions  are  the  associated  Legendre  functions,  which  can  be  defined  by 
the  Rodrigues’  formula 

«■(*)  =  -  Jr'^Pn  (*)> 

where  Pn(x)  denotes  the  Legendre  polynomial  of  degree  n. 


Theorem  2.1  (Multipole  Expansion).  Suppose  that  N  charges  of  strengths  51,92,  ••• ,  Qn 
are  located  at  points  X\,X2,...,Xn  with  spherical  coordinates  (pi,cti,0i),  (P2, /%),••  •, 
(Pn,<*n,Pn),  respectively.  Suppose  further  that  the  points  X\,X2,...,X ft  are  located  inside 
a  sphere  of  radius  a  centered  at  the  origin.  Then,  for  any  point  X  =  (r,  6 ,  <f>)  €  R3  with  r>  a, 
the  potential  $(X),  generated  by  the  charges  5i,92,-*’,9Ar,  is  given  by  the  formula 

00  n  A/171 

=  E  E  (5) 

n=0m=- n 


where 


i=l 


(6) 
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Furthermore,  for  any  p  >  1, 


*m-£  £ 


n=0  7?i=  — n 


AC 

rn+l 


Y?V,4>) 


(“•“XT' 


(7) 


The  preceding  theorem  describes  an  efficient  representation  of  the  far  field  due  to  a  collection 
of  sources.  Within  the  FMM,  it  is  also  useful  to  be  able  to  describe  the  field  locally  when  the 
charges  themselves  are  far  away. 


Theorem  2.2  (Local  Expansion)  Suppose  that  N  charges  of  strengths  qi,Q2,'“  iQN  ore  lo¬ 
cated  at  the  points  X\,X2,  •  •  • , X n  in  R3  with  spherical  coordinates  (pi,  (p2, <*2> Pi),  •  •  '> 

(PNi  on,  Pn)  respectively.  Suppose  further  that  all  the  points  X\,X2,  •  •  •  ,Xn  ore  located  outside 
the  sphere  Sa  of  radius  a  centered  at  the  origin.  Then,  for  any  point  X  €  Sa  with  coordinates 
(r,6,<f>),  the  potential  $(X)  generated  by  the  charges  qi,q2,”~  ,qN  is  described  by  the  local 
expansion: 

HX)  =  E  E  Lt-VjV,*)"*  ,  (8) 

j=Ok=-j 


where 


rk  £  Y-^aufr) 

^•  =  E«‘  1  -i+ 1 

i=i  pi 


(9) 


with  A!£  defined  by  (14).  Furthermore,  for  any  p>  1, 

*(*)-£  £  (j)”*1  •  do) 

2.1  Translation  Operators 

The  FMM  relies  on  the  ability  to  translate  multipole  and  local  expansions.  The  relevant 
translation  operators  are  described  in  the  next  three  theorems  [11,  9]. 


Theorem  2.3  (Translation  of  a  Multipole  Expansion)  Suppose  that  N  charges  of  strengths 
qi,q2,‘"  ,Qn  ore  located  inside  the  sphere  D  of  radius  a  centered  at  Xq  —  (p,a,P).  Suppose 
further  that  for  any  point  X  =  (r,  6,  <f>)  €  R3  \  D,  the  potential  due  to  these  charges  is  given  by 
the  multipole  expansion 

00  n  nm 

*(*)  =  £  £  (ii) 

71=0771=— 71  * 

where  (r\  & ,  4>')  are  the  spherical  coordinates  of  the  vector  X  —  Xq. 
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Then,  for  any  point  X  —  (r,  6,  <f>)  outside  a  sphere  D\  of  radius  (a  +  p)  centered  at  the 
origin, 

o°  i  ftfk 

(12) 

j=0k=-j  T 

where 


j  n  Qk-m  .  -|fc|-|m|-|fc-m| .  Am  .  Ak-m  .  n  .  y- 

_  V'  J-n  1 _ 2n  2j- w  K  Jn 


'(*,0) 


n=0m=-n 


(13) 


u/it/i  A£  defined  by  the  formula 


,m  -  (-»)" 

\/(n  — m)!  •  (n  +  m)! 

Furthermore,  for  any  p  >  1, 


*w  -EE 

i=0Jb=-; 


M* 
.  ri+i 


Y?{e,<t>)\< 


(  Efei  tel  A  + 

^r-(c  +  p)yl  r  )  ’ 


(14) 


(15) 


Definition  2.1  Formula  (13)  defines  a  linear  operator  converting  the  multipole  expansion 
coefficients  {Oj}  into  the  multipole  expansion  coefficients  {M*}.  This  linear  mapping  will  be 
denoted  by  Tmm- 


Theorem  2.4  (Conversion  of  a  Multipole  Expansion  to  a  Local  Expansion)  Suppose 
that  N  charges  of  strengths  Qi,Q2,'  ' ' ,  Qn  are  located  inside  the  sphere  Dx0  of  radius  a  centered 
at  the  point  Xo  =  ( p,a,/3 ),  and  that  p  >  (c  +  l)o  for  some  c  >  1.  Then  the  corresponding 
multipole  expansion.  (11)  converges  inside  the  sphere  Dq  of  radius  a  centered  at  the  origin. 
Furthermore,  for  any  point  X  G  Do  with  coordinates  ( r,0,<f> ),  the  potential  due  to  the  charges 
9ii  92>  *  •  ■  >  Qn  is  described  by  the  local  expansion: 


*m=E  E  ri, 

j=Ok=-j 


where 


y-  0?  •  jll-H-W-H  -AS-Af-  YjgW) 


n=Om=— n 

with  Alff  defined  by  (14).  Furthermore,  for  any  p  >  1, 


(16) 


(17) 


(18) 
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Definition  2.2  Formula  (17)  defines  a  linear  operator  converting  the  multipole  expansion 
coefficients  {0*}  into  the  local  expansion  coefficients  {Lk}.  This  linear  mapping  will  be  denoted 
by  Tml- 

Theorem  2.5  (Translation  of  a  Local  Expansion)  Suppose  that  Xq,X  are  a  pair  of  points 
in  R3  with  spherical  coordinates  (p,a,/3),(r,0,<f>)  respectively,  and  ( r',9 ' ,<f>')  are  the  spherical 
coordinates  of  the  vector  X — Xq  and  p  is  a  natural  number.  Let  Xo  be  the  center  of  a  p th-order 
local  expansion  with  p  finite,  its  expression  at  the  point  X  is  given  by  the  formula 

=  E  E  onm-W^')T,n.  (is) 

n=0m=— n 


Then 

T.L)-  *}(»•*)■  ’J. 

j=Ok--j 


(20) 


everywhere  in  R3,  with 

V  n 

3-E  £ 

nszj  m=— n 


0m  .  t.|m|—|m— fc|—|fc| .  jm-k  .  A*  .  Y£rk(a,0)  ■  pn~i 

(_1  )»+i.im 


and  A£  are  defined  by  (14). 


(21) 


Definition  2.3  Formula  (21)  defines  a  linear  operator  converting  the  local  expansion  coeffi¬ 
cients  {O™}  into  the  local  expansion  coefficients  {L™}.  This  linear  mapping  will  be  denoted 
by  Tll- 


Remark  2.1  The  matrices  representing  the  linear  operators  Tmm,  Tml >  and  Tll  are  dense, 
so  that  applying  them  to  truncated  expansions  with  Oip2)  coefficients  costs  0(p4)  operations. 
This  is  one  of  principal  reasons  for  the  relatively  high  CPU  time  requirements  of  most  existing 
FMM  implementations  in  three  dimensions.  Section  3  of  this  paper  provides  tools  for  the  rapid 
application  of  the  operators  Tmm,  Tml,  Tll  to  arbitrary  vectors,  improving  the  efficiency  of 
FMM  algorithms  significantly. 


2.2  Rotation  Operators 

In  this  subsection,  we  introduce  operators  which  transform  multipole  and  local  expansions 
under  rotations  of  the  coordinate  system.  These  operators  will  play  a  role  in  Section  3.  The 
basic  results  are  contained  in  the  next  two  theorems,  whose  proofs  can  be  found  in  [3],  together 
with  formulae  for  the  evaluation  of  the  coefficients  in  (22),  (23). 
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Theorem  2.6  (Rotation  of  Multipole  Expansions)  Suppose  that  (ei,  62,63)  are  the  three 
standard  orthonormal  basis  vectors  in  R3,  given  by  the  formulae 

61  =  (1,0,0), 

e2  =  (0,1,0), 

C3  =  (0,0,1), 

and  (coi,u> 2,^3)  ore  three  other  orthonormal  verctors  in  R3,  forming  another  basis. 

Suppose  further  that  a  harmonic  function  $  :  R3  \  {0}  t-f  R  is  defined  by  the  formula 

P  n  Turn 

*(*)  =  E  E 

n=0m=-n 

with  (r,  6, 4>)  the  spherical  coordinates  of  the  point  X  €  R3  associated  with  the  basis  (ei,  e2, 63). 
Then,  there  exist  coefficients  R£,m'  with  n  =  0,l,***,p,  m  =  — n, ...,n,  m'  =  — n, such 
that  for  any  X  €  R3, 

P  n  Turm‘ 

=  E  E  M' 

n=0  m'=— n 

where  (r,  & ,  <f>')  are  spherical  coordinates  of  X  in  the  system  of  coordinates  associated  with  the 
basis  (wi,W2,«3),  and 

AC'=  E  (22) 

m=— n 

for  all  n  =  0, 1,  ...,p,  m'  =  — n, ...,  n. 

Theorem  2.7  (Rotation  of  Local  Expansions)  Under  the  conditions  of  Theorem  2.6,  sup¬ 
pose  that  a  harmonic  function  $  :  R3  t-f  R  is  defined  by  the  formula 

=  E  E  K- 

n=0  m=— n 

where  (r,  6,  <f>)  are  the  spherical  coordinates  of  the  point  X  €  R3  associated  with  the  basis 
(e*,  e2, 63).  Then  for  any  X  €  R3, 

®m  =  E  E 

n=0  m,= — n 

where  (r,  ,  <£')  ore  spherical  coordinates  of  X  in  the  system  of  coordinates  associated  with  the 
basis  (<*>!,  u>2,W3),  and 

£?’=  E  (23) 

m=— n 

/or  all  n  =  0,1,...,  p,  m'  =  — n,...,n.  Furthermore,  the  coefficients  K£,m'  are  the  same  as  in 

(22). 
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Definition  2.4  Given  a  rotation  fi  :  R3  t-+  R3,  formulae  (22),  (23)  define  operators  converting 
the  multipole  coefficients  {M^1}  into  the  multipole  coefficients  {MJ?}  and  the  local  coefficients 
{£™}  into  the  local  coefficients  {It™},  respectively.  These  two  operators  are  identical,  and  will 
be  denoted  by  72.(0). 

Remark  2.2  An  inspection  of  formulae  (22),  (23)  shows  immediately  that  the  numerical  eval¬ 
uation  of  the  operator  72(0)  requires  0(p3)  operations. 


2.3  Exponential  representation 

The  new  generation  of  FMMs  is  based  on  a  combination  of  multipole  expansions  and  exponen¬ 
tial  or  “plane  wave”  expansions.  Given  a  source  point  P  =  ( xo,yo,zo )  and  a  target  location 
Q  =  ( x ,  y,  z),  with  z  >  zo  and  r  =  \\P  —  Q\\,  we  begin  with  the  formula  [16] 

I  _  _L  f00  o)  f2*  ca((i-*o)  cos  Q+(y-yo)  ^^dadX  (24) 

r  2ir  Jo  Jo 

We  will  construct  approximations  to  the  integral  in  (24)  via  appropriately  chosen  quadrature 
formulae.  These  quadratures  are  investigated  in  detail  in  [23];  in  the  following  lemma,  we 
simply  state  the  result  for  three  special  cases,  corresponding  to  three-digit,  six-digit  and  nine¬ 
digit  accuracy. 


Lemma  2.8  ([23,  12])  Suppose  that  Xo  =  (xo,yo,zo),  X  —  ( x,y,z )  are  a  pair  of  points  in  R3, 
and  that  r  —  ||Al  —  ATo||.  Suppose  further  that  the  coordinates  (x  —  xo,y  —  yo,z  —  zo)  of  the 
vector  X  —  Xq  satisfy  the  conditions 


Then 


1  <  2  -  zo  <  4,  0  <  \j{x- xo)2  +  (y  - yo)2  <  4V2. 


(25) 


,  8 

I  _  Y'  Y'  e-AM(*-*o)-t(i-*o)-cos(Q?  k)-(»-vo).«n(a?(t)] 

r  Ml  “ 


k= 1  j= 1 


<  1.6  x  ltr3,  (26) 


1 

r 


fi  Ml 
*  j=i 


e-Aj-[(*-zo)-*(*-*o)-<»8(a?t)-(y-j«,)«m(aJft)] 


<  1.3  X  10"6, 


(27) 


where  c$k  =  2 irj/Ml,  =  2irj/M%,  =  2nj/M%.  The  weights  {ujf,l  =  1,...,8},  {wf,l  = 

1,...,17},  {t of,/  =  1,...,26},  the  nodes  {Af,i  =  1,...,8},  {Af , /  =  1,...,17},  {Af,Z  =  1,...,26} 
and  the  integer  arrays  {M|,fc  =  1,...,8},  {M%,k  =  1,...,17},  {M%,k  =  1,...,26}  are  given  in 
Tables  IS,  14,  15  of  the  Appendix,  respectively. 
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Remark  2.3  The  conditions  (25)  in  the  preceding  Lemma  appear  to  be  rather  special.  They 
are,  however,  related  to  the  geometric  refinement  of  space  introduced  by  the  FMM  and  their 
use  will  become  clear  in  the  next  section. 


Remark  2.4  When  the  desired  precision  is  clear  from  the  context,  we  will  simplify  the  notation 
used  in  Lemma  2.8,  writing  each  of  the  expressions  (26),  (27),  (28)  in  the  form 


1  t(e)  wu  Mk 

i  _  ^  wk  sr 

T  Aft 

k=l  ±  k  j= 1 


g--Mz-.ro)  .  g*Afc-[(*-io)-cos(ai,t)+(y-j«))-8m(ai,i)] 


<£, 


(29) 


where  the  integers  s(e)  and  the  triplets  {Mjt.to*, A*|  k  =  l,...,e}  all  depend  on  e,  and 
Qj,k  —  2irj/Mk .  The  total  number  of  exponential  basis  functions  used  in  (29)  will  be  denoted 
by 


o(e) 

Sexp  —  yi  Mk-  (30) 

k=l 

3  Data  Structures  and  Fast  Translation  Operators 

In  order  to  develop  a  fast  algorithm,  we  first  define  the  computational  domain  to  be  the 
smallest  cube  in  R3  containing  all  sources.  We  then  build  a  hierarchy  of  boxes,  refining  the 
computational  domain  into  smaller  and  smaller  regions.  At  refinement  level  0,  we  have  a  single 
box  corresponding  to  the  entire  computational  domain.  Refinement  level  /  +  1  is  obtained 
recursively  from  level  l  by  the  subdivision  of  each  box  into  eight  cubic  boxes  of  equal  size.  In 
the  nonadaptive  case,  this  recursive  process  is  halted  after  roughly  logg  N  levels,  where  N  is 
the  total  number  of  sources  under  consideration. 

Definition  3.1  A  box  c  is  said  to  be  a  child  of  box  b,  if  box  c  is  obtained  by  a  single  subdivision 
of  box  b.  Box  b  is  said  to  be  the  parent  of  box  c. 

Definition  3.2  Two  boxes  are  said  to  be  colleagues  if  they  are  at  the  same  refinement  level 
and  share  a  boundary  point.  (A  box  is  considered  to  be  a  colleague  of  itself.)  The  set  of 
colleagues  of  a  box  b  will  be  denoted  by  Coll(b). 

Definition  3.3  Two  boxes  are  said  to  be  well  separated  if  they  are  at  the  same  refinement 
level  and  are  not  colleagues. 

Definition  3.4  With  each  box  b  is  associated  an  interaction  list,  consisting  of  the  children  of 
the  colleagues  of  b's  parent  which  are  well  separated  from  box  b  (Figure  1). 

Note  that  a  box  can  have  up  to  27  colleagues  and  that  its  interaction  list  contains  up  to 
189  boxes.  Figure  1  depicts  the  colleagues  and  interaction  list  of  a  box  in  a  two-dimensional 
setting. 
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Figure  1:  The  colleagues  of  a  (two-dimensional)  box  b  are  darkly  shaded,  while  its  interaction  list 
is  indicated  in  white.  In  three  dimensions,  a  box  b  has  up  to  27  colleagues  and  its  interaction  list 
contains  up  to  189  boxes. 

The  interaction  list  for  each  box  will  be  further  subdivided  into  six  lists,  associated  with  the 
six  coordinate  directions  (+z,  —z,  +y,  —y,  +x,  —x)  in  the  three  dimensional  coordinate  system. 
We  will  refer  to  the  +z  direction  as  up,  the  —  z  direction  as  down,  the  +y  direction  as  north, 
the  —y  direction  as  south,  the  +®  direction  as  east,  and  the  —x  direction  as  west 

Definition  3.5  (Directional  lists) 

The  Uplist  for  a  box  b  consists  of  those  elements  of  the  interaction  list  which  lie  above  b  and 
are  separated  by  at  least  one  box  in  the  +z-direction  (Fig.  2). 

The  Downlist  for  a  box  b  consists  of  those  elements  of  the  interaction  list  which  lie  below  b 
and  are  separated  by  at  least  one  box  in  the  — z-direction. 

The  Northlist  for  a  box  b  consists  of  those  elements  of  the  interaction  list  which  lie  north 
of  b,  are  separated  by  at  least  one  box  in  the  +y-direction,  and  are  not  contained  in  the  Up  or 
Down  lists. 

The  Southlist  for  a  box  b  consists  of  those  elements  of  the  interaction  list  which  lie  south 
of  b,  are  separated  by  at  least  one  box  in  the  —  y-direction,  and  are  not  contained  in  the  Up  or 
Down  lists. 

The  Eastlist  for  a  box  b  consists  of  those  elements  of  the  interaction  list  which  lie  east  of  b, 
are  separated  by  at  least  one  box  in  the  -hc-direction,  and  are  not  contained  in  the  Up,  Down, 
North,  or  South  lists. 

The  Westlist  for  a  box  b  consists  of  those  elements  of  the  interaction  list  which  lie  west  of  b, 
are  separated  by  at  least  one  box  in  the  — i-direction,  and  are  not  contained  in  the  Up,  Down, 
North,  or  South  lists. 

For  any  box  b,  we  will  denote  the  number  of  elements  in  its  Uplist  by  N(Uplist(b)),  and 
adopt  a  similar  convention  for  each  of  the  remain  five  lists. 


Remark  3.1  It  is  easy  to  verify  that  the  original  interaction  list  is  equal  to  the  union  of  the 
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Figure  2:  The  Uplist  for  the  box  b  (see  Definition  3.5). 


Up,  Down,  North,  South,  East  and  West  lists.  It  is  also  easy  to  verify  for  two  boxes  6,  c  that 
c  G  Uplist{b)  6  €  Doumlist(c), 

c  G  Northli$t(b)  &  be  Southlist(c),  (31) 

c  e  Eastlist(b)  be  Westlist(c). 

Furthermore,  suppose  that  two  boxes  b  and  c  are  of  unit  volume  and  that  c  G  Uplist{b).  Then 
for  any  point  X0  =  (x0,j/o,*o)  G  b  and  any  point  X  =  (x,y,z)  e  c,  the  vector  X  -  X0  = 
(x  -xo,y  —  j/o,  z  —  zo)  satisfies  the  inequality 

1  <z-zq  <4,  0<  \J{x-  xo)2  +  (y  -  |/o)2  <  Ay/2.  (32) 

Note  that  this  is  precisely  the  condition  (25)  in  Lemma  2.8. 

Remark  3.2  When  there  is  no  danger  of  confusion,  we  will  use  Uplistfb)  to  refer  to  the  geo¬ 
metrical  region  defined  by  the  union  of  all  boxes  in  the  Uplist  of  box  6.  This  is  a  slight  abuse 
of  notation,  since  Uplistfb)  is,  strictly  speaking,  a  set  of  boxes.  We  will  take  the  same  liberty 
with  Dovmlist(b),  Northlistfb),  Southlistfb),  Eastlistfb),  Westlist(b)  and  Coll(b). 

3.1  Rotation  Based  Translation  Operators 

In  this  section,  we  describe  a  simple  scheme  for  reducing  the  cost  of  applying  any  of  the  three 
operators  Tmm,  Tml ,  Tll  to  an  arbitrary  vector  from  0(p4)  to  0(p3)  operations.  The  scheme 
is  based  on  the  observation  that  when  a  multipole  or  local  expansion  is  translated  al^ng  the 
z-axis,  the  cost  is  reduced  from  0(p4)  to  0(p3)  (5,  12,  22].  The  following  lemma  is  obtained 
immediately  from  the  resulting  simplification  of  formulae  (13),  (17)  and  (21). 


10 


Lemma  3.1  If,  in  Theorems  2.S,  2.4  and  2.5,  the  spherical  coordinates  of  the  point  Xq  are 
(p,  0, 0),  then  the  formulae  (IS),  (17)  and  (21)  assume  the  form 


~  jL  Ak  5 

n=0 

j~h  (-1)n^+n-^'+n+1  ’  ? 

*  “  ^  (-l)«+i  •  .4* 


(33) 

(34) 

(35) 


n=j 


respectively. 

Definition  3.6  The  special  cases  of  the  linear  operators  Tmm,  Tml,  and  Tll  defined  by  the 
formulae  (33),  (34),  and  (35)  will  be  denoted  by  T£iM,  T£fL,  and  T£L  respectively. 

Observation  3.3  (Rotation  Based  Translation  Operators)  Inspection  of  formulae  (33), 
(34),  (35)  indicates  that  each  of  the  operators  T£jM,  T£fL  and  T£L  can  be  applied  numerically  to 
an  arbitrary  pth-order  expansion  for  a  cost  proportional  to  p3.  Thus,  a  translation  operator  can 
be  applied  to  an  arbitrary  vector  for  a  cost  proportional  to  p3  via  the  following  procedure.  First, 
the  system  of  coordinates  is  rotated  so  that  the  new  z-axis  points  to  the  desired  translation 
center.  Then,  the  expansion  is  translated  via  one  of  the  formulae  (33),  (34)  and  (35).  Finally, 
the  translated  expansion  is  rotated  bade  to  the  original  system  of  coordinates.  Since  each  of 
the  three  stages  costs  0(p3)  operations,  the  cost  of  the  whole  process  has  also  been  reduced  to 
0(p3)  operations.  Formally,  the  scheme  we  have  outlined  corresponds  to  the  factorizations 


Tmm 

—  1)°Tmm°^{^), 

(36) 

Tml 

—  U(fl  ^)  °^MI°^(ii)i 

(37) 

Tll 

=  n^i-l)oT£Lon^i), 

(38) 

where  7 Z(Sl)  is  defined  in  section  2.2  and  7i(Cl  *)  denotes  the  inverse  rotation  operator. 


3.2  Plane  Wave  Based  Translation  Operators 

In  three-dimensional  fast  multipole  schemes,  the  operator  Tml  (converting  multipole  expan¬ 
sions  into  local  ones)  tends  to  be  applied  much  more  frequently  then  the  operators  Tmm,  Tll 
which  shift  multipole  and  local  expansions.  Ignoring  boundary  effects,  one  ends  up  applying 
Tml  to  the  multipole  expansion  for  each  box  about  189  times  when  the  charge  distribution  is 
uniform.  The  operators  Tmm,  Tll,  on  the  other  hand,  are  applied  roughly  once  per  box.  In  the 
algorithm  of  this  paper,  the  operators  Tmm,  Tll  are  applied  via  the  order  p3  scheme  described 
in  the  preceding  section;  Tml  is  applied  by  means  of  a  much  more  complicated  procedure, 
involving  the  plane  wave  representation  introduced  in  on  Lemma  2.8  of  section  2.3. 

The  following  observation  provides  an  expansion  of  the  form  (29)  for  the  potential  generated 
by  a  collection  of  charges.  It  is  an  immediate  consequence  of  Lemma  2.8. 
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Observation  3.4  Suppose  that  N  charges  of  strengths  are  located  at  points 

Xi,  X%,  •  •  •,  XN  in  R3  with  Cartesian  coordinates  (si,j/i,«i),  (x2,  j/2^2))  — »  {xn,Vn,zn),  re¬ 
spectively.  Suppose  further  that  all  points  Xi,X^,  ...,Xff  are  inside  a  cubic  box  b  with  unit 
volume  centered  at  the  origin  and  that  X  =  ( x,y,z )  G  R3  such  that  X  €  Upltst(b).  Let  4>(X) 
denote  the  potential  generated  by  the  charges  qi, ...,  qN  and  let  be  defined  by  the  formula 

*(e)  Mk 

Ve(X)  =  Y,  £  W(k,j)  ■  e~XkZ  •  (39) 

k=lj-l 

with  the  coefficients  W(k,j)  given  by  the  formula 

N 

W(k,j)  =  —  V  q{  ■  eXkZ‘  •  c-*^**(*rcos(aif*)+w  sm(ai,fc))j  (4q) 

for  all  A:  =  1, s(e),j  =  1, Af*.  Then,  if  A  =  ]C£Li  |©!»  we  have  the  estimate 

|$(*)-¥e(X)|<Ae.  (41) 

Observation  3.5  A  somewhat  involved  analysis  shows  that,  under  the  conditions  of  the  pre¬ 
ceding  observation,  s(e)  ~  p,  where  p  is  chosen  according  to  (7)  to  achieve  the  same  accuracy 
using  a  multipole  expansion.  Likewise,  the  total  number  of  exponential  basis  functions  Sexp  in 
(39)  is  of  the  same  order  as  the  total  number  of  multipole  moments  (p2)  in  (7)  in  order  that 
the  two  expansions  provide  the  same  precision  e. 

Expansions  of  the  form  (39)  will  be  referred  to  as  exponential  expansions.  Their  main  utility 
is  that  translation  takes  a  particularly  simple  form. 

Theorem  3.2  (Diagonal  translation)  Suppose  that  a  function  \t£(X)  :  R3  *->•  C  is  defined 
by  the  formula  (39),  which  we  view  as  an  expansion  centered  at  the  origin  for  X  =  ( x,y,z ). 
Then,  for  any  vector  Xq  —  (xo,!/o,^o)  €  R3,  we  have  the  shifted  expansion 

*(e)  Mk 

We(X)  =  £  £  V(k>j)  ‘  •  e*^f((*-*o)-coe(ai,t)-Ky-vo).8in(aJ',t)))  (42) 

Jfc=lj=l 

where 


V(k,j)  =  W(k,j)  •  e~XkZ°  -  e*k<xo-coS(aj,k)+yo-*n(aj<k)),  (43) 

for  k  =  l,...,s(e),  j  =  1 Af*. 

Definition  3.7  Formula  (43)  defines  a  linear  operator  mapping  the  coefficients  {W(k,j)}  to 
the  coefficients  {V(k,  j)}.  This  linear  operator  will  be  denoted  by  I?e*p- 
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The  operator  provides  a  tool  for  translating  expansions  of  the  form  (39)  at  a  cost 
of  O(Sexp)  0(j?)  operations.  In  FMM  algorithms,  however,  it  is  convenient  to  be  able 
to  use  multipole  and  local  expansions.  Thus,  in  order  to  be  able  to  use  the  operator  T>exp-, 
linear  operators  converting  multipole  expansions  into  exponential  expansions  and  exponential 
expansions  into  local  expansions  have  to  be  constructed.  The  following  two  theorems  provide 
such  operators. 


Theorem  3.3  Suppose  that  N  charges  of  strengths  91,92,  •  •  •  ,qtf  are  located  inside  a  box  b  of 
volume  d3  centered  at  the  origin,  e  is  a  positive  real  number  and  p  is  an  integer  such  that  for 
any  point  X  6  Uplist(ft)  with  spherical  coordinates  (r,0,  <f>),  the  potential  $(X)  generated  by 
the  charges  91, 92,  * ",  QN  satisfies  the  inequality 


01 


♦m-E  E  ;Sr  •!?*(».« 


n=Om=-n 


<  e. 


(44) 


Then 


*(e)  Mk 


$(X)  -  ^  ^ W(k,j)  - e~(At/d)'2  •  e‘(At/rf)-(*  C08(ai.‘)+J'-8in(aJ.k)) 
k=lj=l 

where  ( x,y,z )  are  the  Cartesian  coordinates  of  X,  A  =  Y1L1  |9i|,  and 


W(kJ)  =  ^  E  H)H  •  ****  E 


0? 


Mk 

*  m=— p 


nfj^l  >/(n-m)!(n  +  m)! 


<(A/d  +  l)-e,  (45) 


(A k/d)n,  (46) 


for  k  —  1, ...,  s(e),  j  1, 


Definition  3.8  Formula  (46)  defines  a  linear  operator  converting  the  coefficients  {0{J*}  into 
the  coefficients  {W(k,j)}.  This  linear  mapping  will  be  denoted  by  Cmx- 


Theorem  3.4  Suppose  that  N  charges  of  strengths  91,92,  •  •  •  ,9at  are  located  inside  a  box  b 
of  volume  d3  centered  at  the  origin,  e  is  a  positive  real  number,  and  that  for  any  point  X  = 
( x,y,z )  €  Uplist(6),  the  potential  ${X)  generated  by  the  charges  91,92, •  •  • ,9n  satisfies  the 
inequality 

«(«)  Mk 
Jk=lj=l 

where  A  —  YaLi  |®|*  Then  there  exists  an  integer  p,  such  that 

*(X)-E  E  C  O«,«  r“  <(A/d+ 1).«, 

n=Om=-n 

where  (r,  0,  <j>)  are  the  spherical  coordinates  of  X  and 

«(«)  Mk 

3=1 


<{A/d)-e,  (47) 


(48) 


H)W 

1,1  \/(rr^m)](Tr:rrnj!  j^v 


(49) 


for  n  =  0,  ...,p,  m  =  — n, ...,  n. 
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Definition  3.9  Formula  (49)  defines  a  linear  operator  converting  the  coefficients  {W(k,j)} 
into  the  coefficients  {L™}.  This  linear  mapping  will  be  denoted  by  Cxl- 


Remark  3.6  It  is  easy  to  see  that  (46)  can  be  evaluated  numerically  for  k  =  1, ...,  s(e),j  = 
1, ...,  Mk,  at  a  cost  proportional  to  p3.  Indeed,  we  first  calculate  (2 p  +  1)  •  s(e)  quantities  Fk,m 
defined  by  the  formula 


Fk,r 


P 


oz 


m)!(n  +  m)! 


(A*/d)n, 


(50) 


for  k  =  1, ...,  s(e),  m  =  — p,...,p.  This  step  requires  0(s(e)  -p2)  operations.  We  then  evaluate 
the  coefficients  W(k,j)  via  the  formula 


W(k,j) 


Wk/d 

Mk 


p 

(-t)|m|  •  eim  ai'k  • 

m=-p 


Fk,mi 


(51) 


for  k  —  1, s(e),  j  =  l,...,Mk,  at  a  cost  of  0{Sexp  • p )  operations.  Thus,  the  total  cost  of 
applying  the  operator  Cmx  numerically  to  a  pth-order  multipole  expansion  is 

Cost  (Cm x)  ~  0(p2s(e)  +  pSexp )  ~  0(p3),  (52) 

making  use  of  Observation  3.5.  A  similar  argument  shows  that  the  operator  Cxl  can  be 
evaluated  numerically  for  a  cost  proportional  to  p3. 


The  proofs  of  Theorems  3.2,  3.3,  3.4  can  be  found  in  [12].  The  following  observation  follows 
immediately  from  Theorems  3.2,  3.3  and  3.4. 


Observation  3.7  (Multipole  to  local  translation  for  the  Uplist)  Suppose  that  b,c  are 
two  boxes  such  that  c  is  in  the  Uplist  of  b.  Then  the  translation  operator  Tml  which  converts 
a  multipole  expansion  centered  in  b  to  a  local  expansion  centered  in  c  can  be  applied  via  the 
following  procedure.  First,  convert  the  multipole  expansion  centered  in  b  into  an  exponential 
expansion  via  the  operator  Cmx >  then,  use  the  operator  'DeXp  to  translate  the  resulting  ex¬ 
ponential  expansion  to  the  center  of  box  c,  finally,  convert  the  latter  expansion  into  a  local 
expansion  in  box  c  via  the  operator  Cxl-  In  short, 

Tml  =  Cxl  0  Z>e*p  °  Cmx-  (53) 

Observation  3.8  (Multipole  to  local  translation:  general  case)  The  decomposition  (53) 
of  the  operator  Tml  is  valid  only  when  box  c  is  in  the  Uplist  of  box  6.  When  box  c  is  not  in 
the  Uplist  of  box  6,  the  operator  Tml  can  easily  be  applied  by  first  rotating  the  system  of 
coordinates,  so  that  in  the  new  coordinate  system,  box  c  lies  in  the  Uplist  of  box  6,  applying 
the  operator  Tml  via  (53)  to  the  rotated  expansion,  and  finally  rotating  bade  to  the  original 
system  of  coordinates.  Formally,  this  corresponds  to  the  factorization 

Tml  —  T,(Q.~X)  o  Cxl  °  0  Cmx  °  71(0,).  (54) 

The  rotation  operators  1l(Q,)  are  described  in  section  2.2. 


14 


Multipole  Local 


Exponential  Exponential 


Figure  3:  A  large  number  of  multipole-to-local  translations,  each  costing  0(p3)  operations  are 
replaced  by  a  single  multipole-to-exponential  operator  costing  0(p3)  operations,  a  large  number  of 
exponential  translations  costing  0(p 2)  operations,  and  a  single  exponential-to-local  operator  costing 
0(p3)  operations. 


Remark  3.9  As  mentioned  earlier,  application  of  the  translation  operators  7 ml  ^  a  dominant 
part  of  FMM  algorithms,  occurring  up  to  189  times  per  booc.  Naive  application  of  these  oper¬ 
ators  results  in  a  cost  of  roughly  189  •  p4  operations  per  box,  which  is  prohibitively  expensive 
in  most  cases.  Fast  rotation-based  schemes  [5,  22,  12]  use  Observation  3.3  to  reduce  the  cost 
to  roughly  189  •  3  •  p3  operations  per  box;  the  resulting  FMM  schemes  are  fairly  efficient  in 
low-precision  applications.  Theorems  3.2,  3.3,  3.4  of  this  subsection  can  be  used  to  reduce 
the  cost  of  application  of  the  operators  Tml  to  approximately  20  -p3  + 189  -p2  operations  per 
box.  Indeed,  in  order  to  account  for  the  interaction  of  box  6  with  its  Uplist  boxes,  we  use 
the  operator  Cmx  of  Theorem  3.3  to  convert  6’s  multipole  expansion  into  an  exponential  one 
for  a  cost  proportional  to  p3.  We  then  use  the  operator  of  Theorem  3.2  to  translate  the 
resulting  exponential  expansion  to  each  of  the  boxes  in  Uplist (b),  for  a  cost  propotional  to 
N(Uplist(b))  -  p2.  Subsequently,  we  convert  the  accumulated  exponential  expansion  for  each 
box  into  a  local  one  via  the  operator  Cxl  of  Theorem  3.4,  for  a  cost  proportional  to  p3.  This 
procedure  is  illustrated  in  Figure  3.  The  analogous  process  must,  of  course,  be  repeated  for  the 
Dovmlist,  Northlist,  Southlist,  Eastlist,  and  Westiist  For  the  Northlist,  Southlist,  Eastlist,  and 
Westlist  (but  not  for  the  Dovmlist),  there  is  an  additional  cost  proportional  to  2-p3  operations 
per  box  to  rotate  the  coordinate  system,  as  described  in  Observation  3.8.  The  total  cost  for 
each  of  the  six  interaction  lists  is  summarized  in  the  following 


Cost(  Uplist) 
Cost(Dovmlist) 
Cost(Northlist) 
Cost(Southlist) 


~  2-p3  +  N( Uplist(b))  -  p2, 

~  2  •  p3  4-  N (Dovmlist (6))  •  p2, 
~  4  •  p3  +  N (Northlist (b))  -  p2, 
~  4  •  p3  +  N(Southlist(b))  •  p2, 


(55) 
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Cost(Eastlist)  ~  4-p3  +  N(Eastlist(b))  -p2, 

Cost(  Westlist)  ~  4  •  p3  +  N(  Westlist(b))  ■  p2, 

respectively.  Combining  (55)  with  the  fact  that  the  maximum  total  number  of  boxes  in  the 
interaction  list  is  189,  we  obtain 

Cost{TML)  ~  20  •  p3  + 189  -  p2.  (56) 

Remark  3.10  The  procedure  of  the  preceding  section  has  been  further  accelerated.  First, 
symmetry  considerations  can  be  used  to  reduce  number  of  translations  per  box  from  189  to 
40  without  any  loss  of  precision.  We  refer  the  reader  to  [12]  for  details.  Second,  while  the 
expansions  (5)  and  (8)  are  expressed  in  terms  of  spherical  harmonics,  they  are  being  used  to 
represent  potentials  inside  or  outside  of  regions  that  are  cubic  in  shape.  Clearly,  spherical 
harmonics  are  not  an  optimal  basis  for  this  purpose.  Special-purpose  harmonics  have  been 
developed  for  the  representation  of  potentials  in  such  regions;  they  have  been  incorporated 
in  our  implementation  and  the  timings  presented  in  Section  6  below  reflect  this  additional 
improvement.  The  procedure  itself  is  fairly  involved,  and  will  be  reported  at  a  later  date  [6]. 

4  The  non-adaptive  FMM 

In  this  section,  we  describe  the  non-adaptive  FMM  algorithm  of  [12],  combining  the  factoriza¬ 
tion  (54)  of  the  translation  operator  Tml  with  the  factorizations  (36),  (38)  of  the  operators 
Tmm  ,  Tll-  We  present  it  here  as  a  reference  for  the  subsequent  adaptive  procedure.  For  details, 
the  reader  is  referred  to  the  original  paper  [12]. 

In  the  FMM,  the  set  of  all  boxes  at  level  l  is  denoted  by  Bi,  with  Bo  consisting  of  the 
computational  box  itself.  With  each  box  6,  we  associate  fourteen  expansions  about  its  center. 

•  A  multipole  expansion  of  the  form  (5)  represents  the  potential  generated  by  charges 
contained  inside  6;  it  is  valid  in  R3  \  Coll(b). 

•  A  local  expansion  ^b  of  the  form  (8)  represents  the  potential  generated  by  all  charges 
outside  Coll(b);  it  is  valid  inside  box  b. 

•  Six  outgoing  exponential  expansions  Wbp,  Wt£,oum,  W6JVortfc,  WfcSoufA,  and 

of  the  form  (39),  representing  the  potential  generated  by  all  charges  located  in  6  and 
valid  in  Uplist(b ),  Dovmlist(b),  Northlist(b),  Southlist(b),  Eastlist(b),  and  Westlist(b), 
respectively. 

•  Sue  incoming  exponential  expansions  Vbp  VbDown,  VbNorth,  y  South ,  yEast >  yWest  Qf 
the  form  (39),  representing  the  potential  inside  6  generated  by  all  charges  located  in 
Dotmlist(b),  Uplist(b ),  Southlist(b),  Northlist(b),  Westlist[b),  and  Eastlist(b) ,  respec¬ 
tively. 
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NON-ADAPTIVE  FMM  ALGORITHM 


Initialization 

Comment  [Choose  number  of  refinement  levels  NLEV  « log8  N,  and  the  order  p  of  the  multipole 
expansions.  The  number  of  boxes  at  the  finest  level  is  then  8NLEV ,  and  the  average 
number  of  particles  per  box  is  s  =  1V/(8ni'®v).  Denote  the  set  of  all  boxes  at  level  l 
by  Bi .] 

Upward  Pass 
Step  1 


Do  for  each  box  b  €  Bnlev. 

Form  multipole  expansion  of  potential  field  due  to 
particles  in  box  b  at  b's  center,  via  Theorem  2.1. 

End  do 


Step  2 


Do  for  levels  l  —  NLEV  —  1,  ...,2, 

Do  for  each  box  b  €  Bi, 

Form  multipole  expansion  about  the  center  of  b  by 
merging  expansions  from  its  eight  children  via  Theorem  2.3. 
(In  applying  Tmm .  use  the  factorization  of  Observation  3.3.) 
End  do 
End  do 


Downward  Pass 
Initialization 


Set  v&j  =  (0,0,  ...,0)  for  all  boxes. 


Step  SA 


Do  for  levels  1  =  2, NLEV, 

Do  for  each  box  b  6  Bi, 

Form  local  expansion  about  the  center  of  b  by 

using  Theorem  2.5  to  shift  the  local  expansion  of  b's  parent  to  b. 

(In  applying  Tll,  use  the  factorization  of  Observation  3.3.) 

End  do 
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Step  SB 


Do  for  Dir  =  Up,  Down ,  North,  South,  East,  West, 

Do  for  each  box  b  €  Bi, 

Convert  the  multipole  expansion  to  the 
"outgoing”  exponential  W£>ir,  via  Theorem  3.3. 

Do  for  each  box  c  €  Dir  —  list{b), 

Translate  Wj?tT  from  b  to  c  via  Theorem  3.2  and  add  to 
End  do 
End  do 

Do  for  each  box  c  €  Bi, 

Convert  the  incoming  exponential  to  the 
local  expansion  ^c,  via  Theorem  3.4. 

End  do 
End  do 
End  do 


Step  4 

Do  for  each  box  b  G  .Bnl ev> 

For  each  particle  in  box  b,  evaluate  at  the  partide  position. 
End  do 


Step  5 


Do  for  each  box  b  €  5nlev> 

For  each  particle  in  box  b, 

compute  interactions  with  particles  in  b's  colleagues  directly. 
End  do 


5  The  adaptive  FMM 

The  preceding  algorithm  is  efficient  for  reasonably  uniform  distributions  of  particles,  but  its 
performance  deteriorates  significantly  for  non-uniform  distributions.  To  remedy  this  situation, 
we  construct  an  adaptive  version  of  the  scheme.  Our  strategy  follows  closely  that  used  in  [4] 
for  the  two  dimensional  case.  Starting  with  the  computational  box,  we  build  our  structure 
recursively.  If  the  box  under  consideration  contains  no  charges,  its  existence  is  immediately 
forgotten.  If  it  contains  fewer  than  s  charges  (where  s  is  an  appropriately  chosen  positive 
integer),  it  is  not  subdivided  further  and  considered  childless.  Otherwise,  it  is  considered  a 
parent  box  and  subdivided  into  its  eight  children.  The  procedure  is  then  repeated  for  each  of 
the  latter.  As  in  the  nonadaptive  case,  the  set  of  all  nonempty  boxes  at  level  l  is  denoted  by 
Bi,  with  Bo  consisting  of  the  computational  box  itself. 
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5.1  Adaptive  lists 

In  order  to  describe  the  adaptive  scheme,  we  will  need  the  following  additional  lists. 

Definition  5.1  List  1  of  a  childless  box  6,  denoted  by  L\(b),  is  defined  to  be  the  set  consisting 
of  b  and  all  childless  boxes  adjacent  to  b.  If  6  is  a  parent  box,  its  List  1  is  empty. 

Definition  5.2  List  2  of  a  box  6,  denoted  by  £2(6),  is  the  set  consisting  of  all  children  of  the 
colleagues  of  b’s  parent  that  are  well  separated  from  b. 

Definition  5.3  List  3  of  a  childless  box  b,  denoted  by  £3(6),  is  the  set  consisting  of  all  descen- 
dents  of  6’s  colleagues  that  are  not  adjacent  to  b,  but  whose  parent  boxes  are  adjacent  to  b.  If 
b  is  a  parent  box,  its  list  3  is  empty. 

Note  that  any  box  c  in  £3(6)  is  smaller  than  b  and  is  separated  from  b  by  a  distance  not 
less  than  the  side  of  c,  and  not  greater  than  the  side  of  b. 

Definition  5.4  List  4  of  a  box  b ,  denoted  by  £4(6),  consists  of  boxes  c  such  that  b  €  £3(0);  in 
other  words,  c  6  £4(6)  if  and  only  if  6  €  £3(0). 

Note  that  all  boxes  in  £4(6)  are  childless  and  are  larger  than  b. 

Figure  4  shows  the  four  lists  for  a  box  6  in  two  dimensions.  Of  these,  List  1  and  List  2  have 
simple  analogues  in  the  non-adaptive  algorithm  of  Section  4.  Specifically,  List  1  of  some  finest 
level  box  b  would  consist  of  its  colleagues,  whose  interactions  will  be  accounted  for  directly. 
List  2  of  b  would  consist  of  boxes  that  are  of  the  same  size  as  b  and  are  well  separated:  i.e., 
the  interaction  list  of  Definition  3.4.  Lists  3  and  4  do  not  have  analogues  in  the  non-adaptive 
scheme. 

£2(6)  is  subdivided  further  into  Uplist(b),  Downlist(b ),  Northlist(b),  Southlist(b ),  Eastlist(b), 
and  Westlist(b),  by  obvious  analogy  with  Definition  3.5. 

With  each  box  6,  we  also  associate  fourteen  expansions  by  analogy  with  those  described  in 
section  4.  The  only  difference  is  that  the  multipole  expansion  is  validinR3\(£i(6)U£3(6)). 
Similarly,  the  local  expansion  represents  the  potential  inside  b  generated  by  all  charges 
outside  £1(6)  U  £3(6). 


ADAPTIVE  FMM  ALGORITHM 


Initialization 

Choose  precision  e  and  the  order  of  the  multipole  expansions  p.  Choose  the  maximum  number  s 
of  charges  allowed  in  a  childless  box.  Define  Bo  to  be  the  smallest  cube  containing  all  sources  (the 
computational  domain). 
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Figure  4:  Lists  1-4  for  box  b 

Build  Tree  Structure 
Step  0 


Do  for  levels  1  =  0,1, 2, ... 

Do  for  each  box  b  €  £/ 

If  6  contains  more  than  s  charges  then 

Divide  b  into  eight  child  boxes.  Ignore  empty  children 
and  add  the  nonempty  child  boxes  to  2?j+1. 

End  if 
End  do 
End  do 

Comment  [Denote  the  greatest  refinement  level  obtained  above  by  NLEV  and  the  total  number 
of  boxes  created  as  NBOX.  Create  the  four  lists  for  each  box.] 

Do  for  each  box  bi,  i  =  1, 2, NBOX 

Create  lists  £i(i>i),  £2(6*),  £3  (*>»),  LA(bi). 

Split  £2(61)  into  Up,  Down,  North,  South,  East,  West  lists. 

End  do 


Upward  Pass 
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Comment  [During  the  upward  pass,  a  pth-order  multipole  expansion  is  formed  for  each  box  b  about 
its  center,  representing  the  potential  in  R3  \  (Za(6)  U  £3(6))  due  to  all  charges  in  6.] 

Step  1 

Comment  [For  each  childless  box  b,  form  a  multipole  expansion  about  its  center  from  all  charges 
in  b.] 

Do  for  each  box  6j,i  =  1,2,  ...,NBOX 
If  bi  is  childless  then 

Use  Theorem  2.1  to  form  pth-order  multipole  expansion  $t,it 
representing  the  potential  in  R3  \  (Li(b)  ULs(b))  due  to  all  charges  in  fc,. 

End  if 
End  do 


Step  2 

Comment  [For  each  parent  box,  form  a  multipole  expansion  about  its  center  by  merging  multipole 
expansions  from  its  children.] 

Do  for  levels  l  =  NLEV  -  1,NLEV  -  2,  ...,0 
Do  for  each  box  b  G  B\ 

If  &  is  a  parent  box  then 

Use  the  operator  Tmm  to  merge  multipole  expansions  from 
its  children  into 
End  if 
End  do 
End  do 


Downward  Pass 

Comment  [During  the  downward  pass,  a  pth-order  local  expansion  is  generated  for  each  box  6  about 
its  center,  representing  the  potential  in  b  due  to  all  charges  outside  (Li(b)  U  £3(6)).] 

Step  S 

Comment  [For  each  box  b,  add  to  its  local  expansion  the  contribution  due  to  charges  in  £4(6).] 

Do  for  each  box  6j,  t  =  1, 2,  •  •  • ,  NBOX 
Do  for  each  box  c  €  £4(61) 

If  the  number  of  charges  in  bi  <  p2  then 

Comment  [The  number  of  charges  in  b{  is  small.  It  is  faster  to  use  direct  calculation 
than  to  generate  the  contribution  to  the  local  expansion  due  to  charges 
in  c;  act  accordingly.] 
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Calculate  potential  field  at  each  particle  point  in  bi 
directly  from  charges  in  c. 

Else 

Comment  [The  number  of  charges  in  bi  is  large.  It  is  faster  to  generate  the  contribution 
to  the  local  expansion  due  to  charges  in  c  than  to  use  direct  calculation; 
act  accordingly.] 

Generate  a  local  expansion  at  Vs  center  due  to 
charges  in  c,  and  add  to  $6,.. 

End  if 
End  do 
End  do 

Step  4 

Comment  [For  each  box  b  on  level  l  with  l  =  2,3,...,NLEV  and  for  each  direction  Dir  = 
Up,  Down,  North,  South,  East,  West,  create  from  box  b's  multipole  expansion  the  out¬ 
going  exponential  W^*r  in  direction  Dir,  using  the  operator  Cmx-  Translate  Wfir 
to  the  center  of  each  box  c  €  Diriist(b)  using  Corollary  3.2,  and  add  the  translated 
expansions  to  its  incoming  exponential  expansion  V£Dir.] 

Do  for  levels  1  =  2,3, ...,  NLEV 

Do  for  Dir  =  Up,  Down,  North,  South,  East,  West 
Do  for  each  box  6  €  B[ 

Use  the  operator  Cmx  to  convert  multipole  expansion 
$6  into  exponential  Wfir. 

Do  for  each  box  c  €  Dirlist(b) 

Translate  the  outgoing  exponential  expansion  to  the  center  of  box  c 
using  the  diagonal  translation  operator  Dxx<  and  add  the  translated 
expansion  to  the  incoming  exponential  expansion  V£Ht . 

End  do 
End  do 

Comment  [For  each  box  c  on  level  l,  convert  the  exponential  expansion  into  a 
local  expansion  and  add  it  to  ^e.] 

Do  for  each  box  c  €  £/ 

Use  the  operator  Cxl  to  convert  the  exponential  expansion  VPiT 
into  a  local  expansion,  and  add  it  to  ^c. 

End  do 
End  do 
End  do 
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Step  5 

Comment  [For  each  parent  box  b,  shift  the  center  of  its  local  expansion  to  its  children.] 

Do  for  each  box  bi ,  i  =  1, 2,  •  •  • ,  NBOX 
If  bi  is  a  parent  box  then 

Use  the  operator  Tll  to  shift  the  local  expansion  ^  to  the  centers  of  its 
children,  and  add  the  translated  expansions  to  children's  local  expansions. 
End  if 
End  do 


Evaluation  of  Potentials 

Step  6 

Comment  [Include  contribution  to  potential  from  local  expansion  at  leaf  nodes.] 

Do  for  each  box  bi,i  —  1,2,..., NBOX 
If  bi  is  childless  then 

Calculate  the  potential  at  each  charge  in  bi  from  the  local  expansion  'I'*;. 
End  if 
End  do 


Step  7 

Comment  [Include  contribution  from  direct  interactions.] 

Do  for  each  box  bi,i  =  1,2,...,  NBOX 
If  bi  is  childless  then 

Calculate  the  potential  at  each  charge  in  bi 
directly  due  to  all  charges  in  Li(bi). 

End  if 
End  do 


Step  8 

Comment  [For  each  childless  box  b,  evaluate  the  potential  due  to  all  charges  in  1^(6).] 

Do  for  each  box  bi,i  =  1,2,...,  NBOX 
If  bi  is  childless  then 

Do  for  each  box  c  €  Lz(bi) 

If  the  number  of  charges  in  c  <  p2  then 

Comment  [The  number  of  charges  in  c  is  small.  It  is  faster  to  use  direct  calculation 
than  to  evaluate  the  multipole  expansion  act  accordingly.] 
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Calculate  the  potential  at  each  charge  in  bi 
directly  from  charges  in  c. 

Else 

Comment  [The  number  of  charges  in  c  is  large.  It  is  faster  to  evaluate  the  expansion 
$c  than  to  use  direct  calculation;  act  accordingly.] 

Calculate  the  potential  at  each  charge  in  bi 
from  muitipole  expansion  $c. 

End  if 
End  do 
End  if 
End  do 


Remark  5.1  Step  3  in  the  above  algorithm  could  be  simplified  without  increasing  the  asymp¬ 
totic  CPU  time  estimate  of  the  latter.  Specifically,  we  could  always  generate  the  contribution 
to  the  local  expansion  'J'j  due  to  charges  in  c,  even  when  the  number  of  charges  in  c  is  small. 
However,  the  actual  computation  time  would  increase  somewhat.  A  similar  observation  can  be 
made  about  Step  8  of  the  above  algorithm. 

Remark  5.2  In  the  actual  implementation  of  the  adaptive  algorithm,  we  have  introduced 
several  minor  modifications,  designed  primarily  to  reduce  the  memory  requirements  of  the 
scheme.  In  particular,  Steps  3,  4,  and  5  of  the  downward  pass  have  been  combined  to  eliminate 
some  of  the  intermediate  storage. 

5.2  Complexity  Analysis  and  Comparison  with  Tree  Codes 

The  cost  of  the  FMM  algorithm  of  this  paper  (like  the  cost  of  older  schemes  of  this  type)  can 
be  separated  into  two  parts.  The  first  part  concerns  the  construction  of  the  data  structure 
(Step  0)-,  the  second  part  concerns  the  calculation  of  the  potentials. 

If  N  denotes  the  total  number  of  particles  in  the  system,  the  CPU  time  estimate  for  the 
first  part  is  0(N  log  N)  in  the  general  case  and  0(N)  for  reasonably  uniform  distributions  of 
particles,  where  “bin  sorting”  can  be  used  instead  of  the  recursive  procedure  outlined  above. 
The  CPU  time  requirements  for  the  second  part  are  O(N)  in  all  cases.  In  practice,  however, 
the  first  part  uses  a  negligible  proportion  of  the  total  CPU  time. 

There  has  been  some  confusion  in  the  literature  concerning  computational  complexity, 
partly  because  of  an  erroneous  proof  in  the  original  paper  [4]  addressing  the  two  dimpnsirmal 
case.  A  correct  proof  can  be  found  in  [17],  under  very  general  assumptions  about  the  distribu¬ 
tion  of  charges.  We  omit  the  detailed  analysis  of  the  asymptotic  time  and  storage  estimates  for 
the  algorithm  of  this  paper  since  it  does  not  differ  materially  from  that  in  [17].  For  reasonably 


24 


uniform  distributions,  it  is  easy  to  see  that  the  asymptotic  cost  of  the  nonadaptive  algorithm 
is  approximately 

27 N  s  +  2Np2  + 189  —  p2  +  20—  p3, 
s  s 

where  s  is  the  number  of  charges  per  box  at  the  finest  level.  The  first  term  comes  from  direct 
interactions  with  colleagues,  the  second  comes  from  forming  and  evaluating  multipole  and  local 
expansions  at  the  finest  level,  and  the  last  two  come  from  multipole-to-local  translations,  as 
shown  in  (56).  Using  symmetry  considerations,  it  is  possible  to  reduce  the  factor  189  to  40 
(see  Remark  3.10  above).  Setting  s  ~  p3/2,  we  see  that  the  work  required  by  the  nonadaptive 
FMM  is  of  the  order 

0{Np3/2). 

Similarly,  the  storage  costs  are  of  the  order 

0(—p2)  ~  0(Np3/2). 
s 

In  the  adaptive  case,  precise  estimates  are  more  involved,  but  the  reader  will  note  in  the 
numerical  examples  below  that  both  CPU  times  and  storage  requirements  are  at  a  maximum 
for  the  most  homogenous  distributions. 

A  second  area  where  there  has  been  some  confusion  concerns  comparisons  of  the  FMM  with 
what  are  generally  known  as  “tree  codes.”  These  were  introduced  independently  of  the  FMM 
by  Barnes  and  Hut  [2].  (A  related  scheme  by  Appel  [1]  is  more  like  the  FMM  than  like  a  tree 
code.)  In  tree  codes,  all  interactions  are  computed  by  either  direct  calculation  or  by  evaluation 
of  a  multipole  expansion  for  a  source  box  at  a  well-separated  target  position.  Within  the  FMM, 
however,  one  has  four  options  for  a  source  box  b  and  a  target  box  c: 

1.  compute  interactions  directly, 

2.  evaluate  the  multipole  expansion  for  b  at  individual  targets  in  c  directly, 

3.  convert  the  field  due  to  each  source  in  b  to  a  local  expansion  in  c  (which  is  later  evaluated), 

4.  convert  the  multipole  expansion  in  b  to  a  local  expansion  in  c  (which  is  later  evaluated). 

A  properly  implemented  FMM  always  selects  the  least  expensive  option  (which  is  trivial  to 
choose);  thus,  it  is  always  more  efficient  than  a  tree  code.  We  omitted  this  decision  analysis 
in  our  original  descriptions  of  the  FMM  [10,  11,  18]  in  order  to  focus  on  the  central  result, 
which  is  option  4  above.  It  is  this  option  which  reduces  the  cost  to  0(N).  It  is  easy  to  see  that 
options  2  and  3  are  appropriate  only  in  Steps  3  and  8  above,  when  considering  Lists  3  and  4. 
The  analogues  of  Steps  3  and  8  here  are  Stages  5  and  6  in  [4]. 


25 


6  Numerical  Results 


The  algorithm  described  in  Section  5  has  been  implemented  in  Fortran  77,  and  numerical 
experiments  have  been  carried  out  for  a  variety  of  charge  distributions  using  a  Sun  UltraSPARC 
workstation  with  a  CPU  clock  rate  of  167  MHz.  The  results  of  our  experiments  are  summarized 
in  Tables  1-12,  with  all  timings  given  in  seconds. 

In  the  first  set  of  our  experiments,  the  charges  were  distributed  randomly  but  uniformly  in 
the  cube  [—0.5, 0.5]  x  [—0.5, 0.5]  x  [—0.5, 0.5];  results  are  reported  in  Tables  1-3.  In  the  second 
set,  the  charges  were  distributed  randomly  in  the  polar  angles  6  and  <f>  on  the  surface  of  a  sphere 
of  radius  0.5,  centered  at  the  origin.  Obviously,  such  a  distribution  is  concentrated  at  the  poles 
(Figure  5);  results  are  reported  in  Thbles  4-6.  In  the  third  set,  the  charges  were  distributed 
on  the  surface  of  a  cylinder  with  height  1.0  and  radius  0.05  (Figure  6);  results  are  reported 
in  Tables  7-9.  In  the  final  set  of  experiments,  the  charges  were  distributed  on  a  complicated 
surface  shown  in  Figure  7.  The  results  for  this  configuration  are  reported  in  Tables  10-12.  In 
all  our  experiments,  the  charge  strengths  were  taken  randomly  from  the  interval  (—0.5, 0.5). 

For  each  geometry,  the  numerical  tests  were  performed  with  three-,  six-,  and  nine-digit 
accuracy.  For  three-digit  accuracy,  the  maximum  number  of  charges  allowed  in  a  childless  box 
was  set  to  be  40.  Corresponding  numbers  for  six-  and  nine-digit  accuracies  are  100  and  180, 
respectively.  The  timings  produced  by  the  adaptive  FMM  algorithm  were  compared  with  those 
obtained  by  the  direct  calculation.  Obviously,  it  was  not  practical  to  apply  the  direct  scheme 
to  large-scale  ensembles  of  particles,  due  to  excessive  computation  times.  Thus,  the  direct 
algorithm  was  used  to  evaluate  the  potentials  at  the  first  100  elements  of  the  ensemble,  and  the 
resulting  CPU  time  was  extrapolated.  Similarly,  the  accuracy  of  the  algorithm  was  calculated 
at  the  first  100  particles  via  formula  (57)  below. 

The  tables  are  organized  as  follows. 

1.  The  first  column  lists  the  number  of  charges  used  in  the  calculation. 

2.  The  second  column  lists  the  number  of  levels  used  in  the  multipole  hierarchy. 

3.  The  third  column  lists  the  order  of  the  multipole  expansion  used. 

4.  The  fourth  column  lists  the  corresponding  number  of  exponential  basis  functions. 

5.  The  fifth  column  lists  the  amount  of  storage  used  by  the  adaptive  FMM  algorithm.  In  the 
three-  and  six-digit  cases,  we  indicate  the  number  of  single  precision  (REAL*4)  words 
used,  while  in  the  nine-digit  case,  we  indicate  the  number  of  double  precision  (REAL*8) 
words  used. 

6.  Columns  six  and  seven  contain  the  CPU  times  required  by  the  adaptive  FMM  and  the 
direct  calculation,  respectively.  In  the  three-  and  six-digit  cases,  both  the  FMM  and 
the  direct  calculations  were  performed  in  single  precision;  in  the  nine-digit  case,  both 
calculations  were  performed  in  double  precision. 
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7.  Column  eight  lists  the  L2  norm  of  the  error  in  the  FMM  approximation,  which  is  computed 
via  the  formula 


V  Zhw*i)\2  ) 


(57) 


where  $(x$)  axe  potentials  obtained  by  the  FMM  algorithm  and  $(x»)  are  potentials 
computed  by  direct  calculation  in  double  precision. 


Table  1:  Timing  results  for  the  FMM  for  3-digit  of  accuracy  with  charges  uniformly  distributed  in 
a  cube.  Calculations  were  performed  in  single  precision. 


N 

P 

El 

Storage 

Bill 

WKSM 

Error 

■grjgjl 

4 

2267 

m 

52 

1359822 

13.3 

233 

7.9  •  lO"4 

mm 

4 

4681 

m 

52 

3365896 

24.7 

1483 

5.2  •  lO"4 

200000 

5 

33749 

m 

52 

24789948 

158 

24330 

8.4  •  10"4 

500000 

5 

37449 

10 

52 

28835176 

268 

138380 

jr^all 

1000000 

6 

48324 

10 

52 

34798506 

655 

563900 

7.1  •  10-4 

Table  2:  Timing  results  for  the  FMM  for  6-digit  of  accuracy  with  charges  uniformly  distributed  in 
a  cube.  Calculations  were  performed  in  single  precision. 


N 

P 

@9 

Storage 

— 

Tjym 

Error 

3 

585 

El 

258 

1057852 

15.9 

233 

4 

2065 

258 

3383488 

69 

1483 

’  4 

4681 

19 

258 

8220716 

198 

24330 

iZjj 

5 

36665 

19 

258 

64326704 

586 

138380 

4.4  •  10“7 

1O00000 

5 

37449 

19 

258 

66414780 

1245 

563900 

4.4  •  10~7 

The  following  observations  can  be  made  from  these  tables. 

1.  The  application  of  the  FMM  to  large  scale  three  dimensional  problems  is  within  practical 
reach. 

2.  The  actual  CPU  time  required  by  the  adaptive  FMM  algorithm  grows  approximately 
linearly  with  the  number  of  particles  N. 

3.  The  algorithm  breaks  even  with  the  direct  calculation  at  about  N  —  750  for  three-digit 
precision,  N  =  1500  for  six-digit  precision  and  N  =  2500  for  nine-digit  precision. 

4.  The  performance  of  the  algorithm  is  quite  insensitive  to  the  distribution  of  charges. 
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Table  3:  Timing  results  for  the  FMM  for  9-digit  of  accuracy  with  charges  uniformly  distributed  in 
a  cube.  Calculations  were  performed  in  double  precision. 


N 

Levels 

Boxes 

P 

IS 9 

Storage 

Error 

•.  .  T 

3 

585 

29 

Ipjfl 

2012453 

34 

296 

■Mil 

3 

585 

29 

2012453 

96 

1920 

200000 

4 

4681 

29 

670 

16479203 

385 

30800 

Ira  S3 

500000 

4 

4681 

29 

670 

16479203 

1219 

192600 

eesessi 

Table  4:  Timing  results  for  the  FMM  for  3-digit  of  accuracy  with  charges  distributed  on  the  surface 
of  a  sphere.  Calculations  were  performed  in  single  precision. 


N 

Levels 

Boxes 

Lp_ 

ESI 

Storage 

WEM 

Error 

1746 

52 

891080 

8.7 

233 

4.2  •  10"4 

4757 

1 

52 

2394568 

21.6 

1483 

3.6  •  10~4 

200000 

11 

18221 

n 

52 

9126212 

97 

24330 

12 

40717 

Wv 

52 

20413944 

224 

138380 

1000000 

13 

90139 

mil 

52 

45287934 

473 

563900 

Table  5:  Timing  results  for  the  FMM  for  6-digit  of  accuracy  with  charges  distributed  on  the  surface 
of  a  sphere.  Calculations  were  performed  in  single  precision. 


N 

Levels 

Boxes 

[_P 

Stxp 

Storage 

E99 

Error 

jM 

624 

sa 

258 

1037742 

16 

233 

1774 

m 

258 

2774248 

40 

1483 

6790 

19 

258 

10365264 

183 

24330 

2.3  •  lO"7 

10 

18897 

19 

258 

28580428 

529 

138380 

4.3  •  10~7 

1000000 

11 

33289 

19 

258 

50405060 

926 

563900 

2.9  •  10"7 

Table  6:  Timing  results  for  the  FMM  for  9-digit  of  accuracy  with  charges  distributed  on  the  surface 
of  a  sphere.  Calculations  were  performed  in  double  precision. 


N 

Levels 

Boxes 

P 

^39 

Storage 

ESI 

MEM 

Error 

429 

29 

■ 

1422805 

33 

1091 

29 

1 

3616209 

98 

8 

4342 

29 

670 

14394468 

409 

RmK  ffil 

ES3 

10 

9009 

29 

670 

29828865 

1038 

Wmm 1 
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Table  7:  Timing  results  for  the  FMM  for  3-digit  of  accuracy  with  charges  distributed  on  the  surface 
of  a  cylinder.  Calculations  were  performed  in  single  precision. 


N 

Levels 

Boxes 

P 

Storage 

El 

2dir 

Error 

KjYTjl 

6 

1963 

10 

52 

1013298 

8.2 

233 

2.7  •  10"4 

7 

4084 

10 

52 

2014394 

20.8 

1483 

4.0  •  10"4 

200000 

8 

10 

52 

9056494 

93 

24330 

5.1  •  10“4 

500000 

9 

10 

52 

15409424 

194 

138380 

5.1  •  10“4 

1000000 

9 

101374 

10 

52 

49326404 

457 

563900 

4.9  •  10"4 

Table  8:  Timing  results  for  the  FMM  for  6-digit  of  accuracy  with  charges  distributed  on  the  surface 
of  a  cylinder.  Calculations  were  performed  in  single  precision. 


N 

Levels 

Boxes 

P 

Storage 

wsm 

■E09 

Error 

5 

505 

19 

258 

868700 

13.8 

233 

2.5  •  10"7 

6 

2037 

19 

258 

3180832 

39 

1483 

2.9  -  10-7 

7001 

19 

258 

10582852 

143 

24330 

5.6  •  10"7 

19849 

19 

258 

29654956 

508 

138380 

1000000 

29341 

19 

258 

44253336 

921 

563900 

6.4  •  10"7 

Table  9:  Timing  results  for  the  FMM  for  9-digit  of  accuracy  with  charges  distributed  on  the  surface 
of  a  cylinder.  Calculations  were  performed  in  double  precision. 


N 

Levels 

Boxes 

P 

E5I 

Storage 

HW 

■EH 

Error 

5 

505 

29 

1676098 

30 

6 

751 

29 

m  ..  > 

2478241 

86 

1920 

RS  W 

mm 

2515 

29 

670 

8348058 

341 

30800 

mm 

7344 

29 

670 

24250893 

795 

192600 

Table  10:  Timing  results  for  the  FMM  for  3-digit  of  accuracy  with  charges  distributed  as  in  Fig.  7. 
Calculations  were  performed  in  single  precision. 


N 

Levels 

Boxes 

P 

Eli 

Storage 

E£99 

mm 

Error 

20880 

7 

1213 

10 

52 

573996 

6.7 

243 

jgg|giB| 

51900 

8 

4184 

10 

52 

1952046 

17 

1539 

203280 

9 

15423 

10 

52 

7204398 

60 

24730 

3.4 -10“4 

503775 

10 

45837 

10 

52 

21358082 

164 

141060 

3.3  •  10"4 

1007655 

10 

60427 

10 

52 

28513092 

282 

568090 

2.9  •  10"4 

29 


Table  11:  Timing  results  for  the  FMM  for  6-digit  of  accuracy  with  charges  distributed  as  in  Fig.  7. 
Calculations  were  performed  in  single  precision. 


N 

Levels 

S^xp 

Storage 

HS9 

Error 

20880 

■gfl 

19 

258 

1601028 

17 

243 

51900 

19 

258 

2165338 

40 

1539 

203280 

4447 

19 

258 

6697050 

149 

24730 

BEK  5fl 

503775 

15307 

19 

258 

22662792 

323 

141060 

2.6  •  10"7 

1007655 

10 

45784 

19 

258 

67176488 

714 

568090 

2.0  •  10~7 

Table  12:  Timing  results  for  the  FMM  for  9-digit  of  accuracy  with  charges  distributed  as  in  Fig.  7. 
Calculations  were  performed  in  double  precision. 


N 

Levels 

Boxes 

P 

Sexp 

Storage 

E1W 

'MESm 

Error 

20880 

6 

574 

29 

10® 

1856177 

46 

309 

51900 

7 

1191 

29 

■ 

3855741 

101 

2020 

203280 

8 

3883 

29 

670 

12577869 

342 

32050 

6.5  •  IQ"12 

503775 

9 

11499 

29 

670 

37263647 

896 

193900 

»-* 

o 

• 

h-» 

O 

1 

Figure  5:  Charges  distributed  on  the  surface  of  a  sphere. 
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Figure  6:  Charges  distributed  on  the  surface  of  a  cylinder. 


Figure  7:  Charges  distributed  on  a  complicated  object. 


7  Generalizations  and  Conclusions 


We  have  described  an  adaptive  FMM  for  the  Laplace  equation  based  on  a  new  diagonal  form 
for  translation  operators  acting  on  harmonic  functions.  It  is  related  to  the  FMM  for  the 
high-frequency  Helmholtz  equation,  in  the  sense  that  the  latter  is  based  on  diagonal  forms  of 
translation  operators  for  partial  wave  expansions  [7,  19,  20]. 

The  present  scheme  admits  a  number  of  extensions.  The  most  straightforward  ones  are  to 
the  Helmholtz  equation  at  low  frequencies  and  to  the  Yukawa  equation.  The  corresponding 
multipole  expansions  are  well-known,  and  appropriate  plane  wave  representations  have  been 
derived  (see,  for  example,  [13]). 

From  a  more  abstract  perspective,  it  is  worth  noting  that  the  main  improvement  made  in 
this  paper  and  in  [12]  over  earlier  FMMs  is  due  to  the  use  of  one  basis  for  representing  the  far 
field  due  to  a  collection  of  sources  (spherical  harmonics)  and  a  separate  basis  for  translating 
information  between  boxes  in  the  FMM  data  structure  (plane  waves).  The  applicability  of 
this  approach  is  not  limited  to  the  Laplace  and  Helmholtz  equations.  We  are  currently  in 
the  process  of  constructing  such  optimal  (or  nearly  optimal)  bases  for  more  general  potentials, 
including  those  that  do  not  satisfy  a  partial  differential  equation,  but  possess  certain  less 
stringent  analytical  properties.  A  forthcoming  paper  [8]  describes  such  an  algorithm  for  the 
square  root  of  the  Laplacian  in  two  dimensions;  further  generalizations  will  be  reported  at  a 
later  date. 

8  Appendix 

The  three  tables  in  this  Appendix  contain  the  nodes  and  weights  (in  columns  2  and  3)  needed 
for  discretization  of  the  outer  integral  in  Lemma  2.8.  Column  4  contains  the  number  of  dis¬ 
cretization  points  needed  in  the  inner  integral,  which  we  denote  by  M%. 


Table  13:  Nodes,  weights  and  for  3-digit  accuracy. 


k 

Node 

Weight 

1 

0.10934746769000 

0.27107502662774 

4 

2 

0.51769741015341 

0.52769158843946 

8 

3 

1.13306591611192 

0.69151504413879 

16 

4 

1.88135015110740 

0.79834400406452 

16 

5 

2.71785409601205 

0.87164160121354 

24 

6 

3.61650274907449 

0.92643839116924 

24 

7 

4.56271053303821 

0.97294622259483 

8 

8 

5.54900885348528 

1.02413865844686 

4 

32 


Tbble  14:  Nodes,  weights  and  M%  for  6-digit  accuracy. 


II 

Node 

Weight 

turn 

0.05599002531749 

0.14239483712194 

8 

0.28485138101968 

0.31017671029271 

8 

0.66535367065853 

0.44557516683709 

16 

1.16667904805296 

0.55303383994159 

16 

5 

1.76443027413431 

0.63944903363523 

24 

6 

2.44029832236380 

0.70997911214019 

32 

7 

3.18032180991515 

0.76828253949732 

32 

8 

3.97371715777193 

0.81713201141707 

32 

9 

4.81216799410634 

0.85872191623337 

48 

10 

5.68932314511487 

0.89480789582390 

48 

11 

6.60040479444377 

0.92680189417317 

48 

12 

7.54190497469911 

0.95586282708096 

48 

13 

8.51136569298099 

0.98299145008230 

48 

14 

9.50723242759128 

1.00913395385703 

48 

15 

10.52874809650967 

1.03531774600508 

48 

16 

11.57587019602884 

1.06318427913963 

8 

17 

12.65078163968520 

1.10232109521088 

4 

33 


Table  15:  Nodes,  weights  and  M%  for  9-digit  accuracy. 


IE3 

Node 

Weight 

i 

0.03705701953816 

0.09473396337900 

8 

2 

0.19219683859955 

0.21384206006426 

16 

3 

0.46045971214897 

0.32031528543989 

16 

4 

0.82805130101422 

0.41254929390710 

16 

5 

1.28121229944787 

0.49176691815621 

24 

6 

1.80792019276297 

0.55998309037174 

32 

7 

2.39814728074333 

0.61909314036708 

32 

8 

3.04359012306582 

0.67064351982741 

32 

9 

3.73732742924096 

0.71586567032066 

48 

10 

4.47354768940212 

0.75576118553096 

48 

11 

5.24735518169467 

0.79116885492295 

48 

12 

6.05462948620944 

0.82280556212477 

64 

13 

6.89191648795972 

0.85129012269433 

64 

14 

7.75633860708838 

0.87715909928110 

64 

15 

8.64551915195994 

0.90087981520398 

64 

16 

9.55751929613924 

0.92286282936149 

72 

17 

10.49078760616705 

0.94347471535979 

72 

18 

11.44412262341269 

0.96305166489156 

80 

19 

12.41664955395045 

0.98191478773737 

80 

20 

13.40781311788324 

1.00038891281291 

88 

21 

14.41739038894472 

1.01882849188686 

88 

22 

15.44553016867884 

1.03765781507554 

88 

23 

16.49282861241170 

1.05744113465683 

88 

24 

17.56045648926099 

1.07903824697122 

72 

25 

18.65046484106274 

1.10434337868208 

32 

26 

19.76847686619416 

1.14488166506896 

4 
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